PowerPoint Presentation
Review
SQL and Relational Databases
What is a Database Management System (DBMS) ?
______________________________________________
2
● DBMS contain information about a particular enterprise, and includes:
○ Collection of interrelated data
○ Set of programs to access the data
● Primary Goal: Provide an abstract view of the data and environment that is both convenient and efficient to use and allows users to interact with databases
Data Models ______________________________________________
3
● Underlying the structure is Data Model: A collection of tools for describing:
○ Data
○ Data relationships
○ Data semantics
○ Data constraints
Source: Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
The European Process Industries STEP Technical Liaison Executive (EPISTLE).
Source: Matthew West and Julian Fowler (1999). Developing High Quality Data Models.
The European Process Industries STEP Technical Liaison Executive (EPISTLE).
Relational Model
______________________________________________
4
● Data is stored in various tables
● Example of tabular data in the relational model:
Rows or
Tuples
Columns or Attributes
Sneak Peak: This is the instructor table from the uni_small database. You can get the same output with:
SELECT * FROM instructor;
What is a Schema?
A)A very high level, conceptual overview of the database B)Collection of tables in the database
C)The collection of saved queries
D)The systems and processes contained in the DBMS supporting the administration of the data
What is a Schema?
A)A very high level, conceptual overview of the database B)Collection of tables in the database
C)The collection of saved queries
D)The systems and processes contained in the DBMS supporting the administration of the data
Design Approaches and Normalization
_______________________________________
7
We need to come up with a methodology to ensure that each of the relations in the database is “good”
● There are two ways of doing so:
○ Entity Relationship Model (Chapter 7)
■ Models an enterprise as a collection of entities and relationships
■ Represented diagrammatically by an entity- relationship diagram:
○ Normalization Theory (Chapter 8)
■ Formalize what designs are bad, and test for them
8
Classwork Question
9
Classwork Question – Answer
Super Key and Candidate key
10
Superkey is a set of one or more attributes that, taken collectively, allows us to identify a tuple in the relation
Let R denote the set of attributes in the schema of relation r
Let K ⊆ R
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)
e: {ID} and {ID, name} are both superkeys of instructor.
Superkey K is a candidate key if K is minimal i.e: {ID} is a candidate key for instructor
Primary Key and Foreign Key
11
One of the candidate keys is selected to be the primary key.
Foreign key constraint: Value in one relation must appear in another
Referencing relation
Referenced relation
i.e.: dept_name in instructor is a foreign key from
instructor referencing department
Consider attributes ID, CITY and NAME. Which one of this can be considered as a super key?
a) NAME
b) ID
c) CITY
d) CITY, ID
Consider attributes ID, CITY and NAME. Which one of this can be considered as a super key?
a) NAME
b) ID
c) CITY
d) CITY, ID
Consider attributes ID, CITY and NAME. Which of these options can be considered as a candidate key?
NAME and CITY
ID
CITY and ID
Any of the above
Consider attributes ID, CITY and NAME. Which of these options can be considered as a candidate key?
NAME and CITY
ID
CITY and ID
Any of the above
Natural Join Example
Given relations r, s:
Natural Join:
A B C D E
α 1 α a α
β 2 γ a γ
γ 4 β a α
A B C
α 1 α
β 2 γ
γ 4 β
B D E
1 a α
2 a β
4 a γ
r
s
16
Natural Join Example
Given relations r, s:
Natural Join:
A B C D E
α 1 α a α
α 1 α a γ
α 1 γ a α
α 1 γ a γ
δ 2 β b δ
A B C D
α 1 α a
β 2 γ a
γ 4 β b
α 1 γ a
δ 2 β b
B D E
1 a α
3 a β
1 a γ
2 b δ
3 b ε
r
s
17
A B
1 2
3 4
5 6
B C D
2 4 6
4 6 8
4 7 9
Suppose relation R(A,B) has the following tuples:
and relation S(B,C,D) has the following tuples:
Compute the natural join of R and S. Which of the following tuples is in the result?
Assume each tuple has schema (A,B,C,D).
a. (3,4,2,6)
b. (3,4,6,8)
c. (1,2,6,8)
d. (1,4,6,8)
a. (3,4,2,6)
b. (3,4,6,8)
c. (1,2,6,8)
d. (1,4,6,8)
Integrity Constraints CREATE TABLE
Types of integrity constraints:
1 n
m
n
NOT NULL
PRIMARY KEY (A , …, A )
FOREIGN KEY (A , …, A ) references r
Example:
PRIMARY KEY declaration on an attribute automatically ensures NOT NULL
CREATE TABLE instructors ( id char(5) primary key,
name varchar(20) not null,
dept_name varchar(20) references department(dept_name),
Salary numeric(8,2));
Navigate to Codio and pgAdmin. Investigate the “takes” table of either the uni_small or uni_large database. Select the CREATE TABLE statement that would create the “takes” table as defined in Codio.
A. CREATE TABLE takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year integer,
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id),
FOREIGN KEY (ID) REFERENCES student,
FOREIGN KEY (course_id, sec_id) REFERENCES section
);
B. CREATE TABLE takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year integer,
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id, semester, year),
FOREIGN KEY (ID) REFERENCES student,
FOREIGN KEY (course_id, sec_id, semester, year) REFERENCES section
);
C. CREATE TABLE takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year integer,
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id, semester, year),
FOREIGN KEY (ID) REFERENCES student,
);
B. CREATE TABLE takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year integer,
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id, semester, year),
FOREIGN KEY (ID) REFERENCES student,
FOREIGN KEY (course_id, sec_id, semester, year) REFERENCES section
);
C. CREATE TABLE takes (
ID varchar(5),
course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year integer,
grade varchar(2),
PRIMARY KEY (ID, course_id, sec_id, semester, year),
FOREIGN KEY (ID) REFERENCES student,
);
1
2
3
4
5
6 CREATE TABLE table_name (
…
column_name data_type,
…
CHECK (column_name IS NOT NULL)
);
Logically, an NOT NULL constraint is equivalent to a CHECK constraint,
therefore, the above statement is equivalent to the following statement.
True
False
1
2
3
4
5
6 CREATE TABLE table_name (
…
column_name data_type,
…
CHECK (column_name IS NOT NULL)
);
Logically, an NOT NULL constraint is equivalent to a CHECK constraint,
therefore, the above statement is equivalent to the following statement.
True
False
The SELECT Clause
The select clause lists the attributes desired in the result of a query (it corresponds to the projection operation of the relational algebra)
Example: find the names of all instructors:
Note: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)
i.e.: Name ≡ NAME ≡ name
Some people use upper case wherever we use bold font.
SELECT name FROM instructor;
28
Find the names of all instructors who have taught some course and the course_id:
Find the names of all instructors in the Physics department who have taught some course and the course_id:
SELECT name, course_id FROM instructor, teaches
WHERE instructor.ID = teaches.ID;
SELECT name, course_id FROM instructor, teaches
WHERE instructor.ID = teaches.ID
AND instructor.dept_name = ‘Physics’;
Aggregate Functions
Aggregate functions operate on the multiset of values of a column of a table, and return a value
AVG: MIN: MAX: SUM:
average value minimum value maximum value sum of values
COUNT: number of values
Find the average salary of instructors in the Computer Science department:
SELECT AVG (salary) FROM instructor
WHERE dept_name = ‘Comp. Sci.’;
In Unilarge dataset, what is the total number of instructors who teach a course in the Spring 2010 semester?
SELECT COUNT (DISTINCT ID)
FROM teaches
WHERE semester = ‘Spring’ AND year = 2010;
Answer – 5
GROUP BY
In Unismall dataset, which department has highest average salary?
SELECT dept_name, AVG(salary) AS avg_salary FROM instructor
GROUP BY dept_name;
Note: Attributes in SELECT clause outside of
aggregate functions must appear in GROUP BY list
Physics
Design Phases: How to Design a Database Schema?
Characterize the data needs of the prospective users.
Choose a data model
Moving from abstract data model to the implementation:
Logical Design phase
Physical Design phase
Difference between E-R and Relational Model
Given the basic ER and relational models, which of the following is INCORRECT?
aa An attribute of an entity can have more than one value
b An attribute of an entity can be composite
c In a row of a relational table, an attribute can have more than one value
d In a row of a relational table, an attribute can have exactly one value or a NULL value
Given the basic ER and relational models, which of the following is INCORRECT?
aa An attribute of an entity can have more than one value
b An attribute of an entity can be composite
c In a row of a relational table, an attribute can have more than one value
d In a row of a relational table, an attribute can have exactly one value or a NULL value
Design Phases (cont.)
36
Choosing a Data Model:
Network Models More natural view of the world Capacity for data independence-limited
Relational Model High degree of data independence May lose important semantic information
Entity Relationship Model High degree of data independence
More natural view of the world
Cardinality Constraints
We express cardinality constraints by drawing either a directed line (→), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.
One-to-one relationship between an instructor and a student:
A student is associated with at most one instructor via the relationship advisor
A student is associated with at most one department via
stud_dept
One-to-Many Relationship
One-to-many relationship between an instructor and a student
an instructor is associated with several (including 0) students via advisor
a student is associated with at most one instructor via advisor
Many-to-One Relationships
In a many-to-one relationship between an instructor and a student,
an instructor is associated with at most one student via advisor,
and a student is associated with several (including 0) instructors via advisor
Many-to-Many Relationship
An instructor is associated with several (possibly 0) students via
advisor
A student is associated with several (possibly 0) instructors via
advisor
Total and Partial Participation
Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set
Participation of student in advisor relation is total
every student must have an associated instructor
Partial participation: some entities may not participate in any relationship in the relationship set
Example: participation of instructor in advisor is partial
Classwork:
Design a database for a worldwide package delivery company (e.g., DHL or FedEx). The database must be able to keep track of customers who ship items and customers who receive items; some customers may do both. Each package must be identifiable and trackable, so the database must be able to store the location of the package and its history of locations. Locations include trucks, planes, airports, and warehouses.
Your design should include an E-R diagram, a set of relational schemas, and a list of constraints, including primary-key and foreign-key constraints.
Classwork Answer – ER Diagram
Location
Classwork Answer: Relationship Schema
Joined Relations
Join operations take two relations and return as a result another relation.
A join operation is a Cartesian product which requires that tuples in the two relations match (under some condition). It also specifies the attributes that are present in the result of the join
The join operations are typically used as subquery expressions in the FROM clause
Three types of joins:
Natural join
Inner join
Outer join
Select * from student, takes
Cartesian product which requires that tuples in the two relations students and takes match
Left join of student table and takes relations
Right join of student and takes relations
Outer join of student and takes relations
Select * from student, takes produces:
Cartesian product which requires that tuples in the two relations match
Left join of student table and takes relations
Right join of student and takes relations
Outer join of student and takes relations
Which one of these two queries is correct for listing the names of all students along with the titles of courses that they have taken:
SELECT name, title
FROM student NATURAL JOIN takes, course WHERE takes.course_id = course.course_id;
SELECT name, title
FROM student NATURAL JOIN takes NATURAL JOIN course;
a.
b.
List the names of all students along with the titles of courses that they have taken:
Correct:
Incorrect:
Why: This query omits all (student name, course title) pairs where the student takes a course in a department other than the student’s own department.
SELECT name, title
FROM student NATURAL JOIN takes, course WHERE takes.course_id = course.course_id;
SELECT name, title
FROM student NATURAL JOIN takes NATURAL JOIN course;
Integrity Constraints on a Single Relation
NOT NULL
PRIMARY KEY
UNIQUE
CHECK (P), where P is a predicate
Not Null and Unique Constraints
NOT NULL
Declare name and budget to be NOT NULL
name varchar(20) NOT NULL
budget numeric(12,2) NOT NULL
UNIQUE (A1, A2, …, Am)
The unique specification states that the attributes A , A , … A
1 2 m
form a candidate key.
Candidate keys are permitted to be null (in contrast to primary keys).
The CHECK clause
CHECK (P) where P is a predicate
Example: ensure that semester is one of fall, winter, spring or summer.
CREATE TABLE section ( course_id varchar(8),
sec_id varchar(8),
semester varchar(6),
year numeric(4),
building varchar(15), room_number varchar(7), time_slot_id varchar(4),
PRIMARY KEY (course_id, sec_id, semester, year),
CHECK (semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’))
);
Referential Integrity
Foreign keys can be specified as part of the CREATE TABLE
statement
By default, a foreign key references the primary key attributes of the referenced table.
SQL allows a list of attributes of the referenced relation to be specified explicitly.
FOREIGN KEY (dept_name) REFERENCES department;
FOREIGN KEY (dept_name) REFERENCES department (dept_name);
SQL Functions
Define a function that, given the name of a department, returns the count of the number of instructors in that department.
In PostgreSQL:
CREATE OR REPLACE FUNCTION dept_count (dept_name varchar(20)) RETURNS integer AS
$func$
DECLARE d_count integer; BEGIN
SELECT COUNT (*) INTO d_count FROM instructor
WHERE instructor.dept_name = $1; RETURN d_count;
END
$func$ LANGUAGE plpgsql;
Recursion in SQL
SQL:1999 permits recursive view definition
Example: find which courses are a prerequisite, whether directly or indirectly, for a specific course. This example view, rec_prereq, is called the transitive closure of the prereq relation
with recursive rec_prereq(course_id, prereq_id)
as (select course_id, prereq_id from prereq
union
select rec_prereq.course_id, prereq.prereq_id from rec_prereq, prereq
where rec_prereq.prereq_id = prereq.course_id)
select * from rec_prereq;
55
The Power of Recursion
Recursive views make it possible to write queries, such as transitive closure queries, that cannot be written without recursion or iteration.
Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of prereq with itself
56
Ranking
Ranking is done in conjunction with an order by specification.
Suppose we are given a relation
student_grades (ID, GPA)
giving the grade-point average of each student
Find the rank of each student:
An extra order by clause is needed to get them in sorted order:
SELECT ID, RANK() OVER (ORDER BY GPA DESC) AS s_rank
FROM student_grades;
SELECT ID, RANK() OVER (ORDER BY GPA DESC) AS s_rank
FROM student_grades ORDER BY s_rank;
Ranking
Ranking may leave gaps: e.g. if 2 students have the same top GPA, both have rank 1, and the next rank is 3
dense_rank does not leave gaps, so next dense rank would be 2
Ranking can be done using basic SQL aggregation, but resultant query is very inefficient.
Example:
SELECT ID, (1 + (SELECT COUNT(*)
FROM student_grades B
WHERE B.GPA > A.GPA)) AS s_rank
FROM student_grades A ORDER BY s_rank;
Sources
http://www.cs.toronto.edu/~ryanjohn/teaching/cscc43-s14/slides/er-database-v01.pdf
https://www.geeksforgeeks.org/dbms-gq/er-and-relational-models-gq/
https://www.geeksforgeeks.org/database-management-systems-set-11/
/docProps/thumbnail.jpeg