CS代考计算机代写 SQL AI database MONASH

MONASH
INFORMATION TECHNOLOGY
The Relational Database Model

Overview
We now have a conceptual model for Monash Software, it is time to move to the second stage and map this to a logical model
For our unit this will involve mapping to the Relational Model in preparation for implementation in a RDBMS
▪ Relational Model
▪ RelationalAlgebra
2

The Relational Model
▪ Introduced by CODD in 1970 – the fundamental basis for the relational DBMS
▪ Basic structure is the mathematical concept of a RELATION mapped to the
‘concept’ of a table (tabular representation of relation)
– Relation – abstract object
– Table – pictorial representation
– Storage structure – “real thing” – eg. isam file of 1’s and 0’s
▪ Relational Model Terminology
– DOMAIN – set of atomic (indivisible) values
• specify
– name
– data type
– data format ▪ Examples:
– customer_number domain – 5 character string of the form xxxdd
– name domain – 20 character string
– address domain – 30 character string containing street, town & postcode
– credit_limit domain – money in the range $1,000 to $99,999
3

A Relation
▪ A relation consists of two parts – heading
– body
▪ Relation Heading
– Also called Relational Schema consists of a fixed set of attributes
• R (A1,A2,…….An)
– R = relation name, Ai = attribute i
– Each attribute corresponds to one underlying domain: • Customer relation heading:
– CUSTOMER (custno, custname, custadd, credlimit)
custno
custname
custadd
credlimit
4

Relation Body
▪ Relation Body
– Also called Relation Instance (state of the relation at any point in time)
• r(R) = {t1, t2, t3, …., tm}
• consists of a time-varying set of n-tuples
– Relation R consists of tuples t1, t2, t3 .. tm
– m = number of tuples = relation cardinality
• each n-tuple is an ordered list of n values
• t = < v1, v2, ....., vn>
– n = number of values in tuple (no of attributes) = relation degree
– In the tabular representation:
• Relation heading ⇨ column headings
• Relation body ⇨ set of data rows
custno
custname
custadd
credlimit
SMITH
Wide Rd, , 3
JONES
Narrow St, , 3
SMI13
JON44 BRO23
BROWN
Here Rd, , 3
2000
10000
10000
5

Relation Properties
▪ No duplicate tuples
– by definition sets do not contain duplicate elements
• hence tuples must be unique ▪ Tuples are unordered within a relation
– by definition sets are not ordered
• hence tuples can only be accessed by content
▪ No ordering of attributes within a tuple – by definition sets are not ordered
6

Relation Properties cont’d
▪ Tuple values are atomic – cannot be divided
• EMPLOYEE (eid, ename, departno, dependants)
– not allowed: dependants (depname, depage) multivalued
– hence no multivalued (repeating) attributes allowed, called the first normal form rule
▪ COMPARE with tabular representation
• normally nothing to prevent duplicate rows • rows are ordered
• columns are ordered
– tables and relations are not the same ‘thing’
7

Q1. Which of the following statements is TRUE according the characteristics of the relational model?
All values in an attribute need to be from the same domain.
Each attribute needs to have a distinct name. The order of attributes and tuples matters.
Each intersection of a attribute and a tuple represent a single value.
More than one statement is TRUE
8

Functional Dependency
▪ Functional Dependency:
– A set of attributes A functionally determines an attribute B if, and only if,
for each A value, there is exactly one value of B in the relation. It is denoted as A → B (A determines B, or B depends on A)
• order_no → order_date
• prod_no → prod_desc
• order_no, prod_no → qty_ordered
9

Relational Model Keys
▪ A superkey of a relation R is an attribute or set of attributes which exhibits only the uniqueness property
– No two tuples of R have the same value for the superkey (Uniqueness property)
– t1[superkey] ≠ t2[superkey]
▪ A candidate key CK of a relation R is an attribute or set of attributes
which exhibits the following properties:
– Uniqueness property (as above), and
– No proper subset of CK has the uniqueness property
Many possible superkeys
(Minimality or Irreducibility property) ie. a minimal superkey
▪ One candidate key is chosen to be the primary key (PK) of a relation. Remaining candidate keys are termed alternate keys (AK).
Only ONE primary key
(may be composed of
many attributes – a
composite primary key) 10
Potentially many possible candidate keys

Q2. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno – employee number
empname – employee name
empsalary – employee salary
emptaxfileno – employee tax file number Possible superkey(s) are:
empno, empname, empsalary, emptaxfilenno empno
emptaxfileno, empname
empname
B and C
A, B and C
A, B, C and D
11

Q3. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno – employee number
empname – employee name
empsalary – employee salary
emptaxfileno – employee tax file number How many candidates keys exist:
0 1 2 3 4
12

Q4. Given the following relation:
EMPLOYEE (empno, empname, empsalary, emptaxfileno) empno – employee number
empname – employee name
empsalary – employee salary
emptaxfileno – employee tax file number How many primary keys exist:
0 1 2 3 4
13

Selection of a Primary key
▪ A primary key must be chosen considering the data that may be added to the table in the future
– Names, dates of birth etc are rarely unique and as such are not a good option
– PK should be free of ‘extra’ semantic meaning and security compliant, preferably a single attribute, preferably numeric (see Table 5.3 Coronel & Morris)
– Natural vs Surrogate primary key
• PATIENT_TREATMENT (patient_id, physician_id, treatment_code, pt_date, pt_time, pt_result)
– Superkey
– CK
– PK
– Issues with PK?
14

15

Null in the Relational Model
▪ NULL is NOT a value – is a representation of the fact that there is NO VALUE
▪ Reasons for a NULL:
– VALUE NOT APPLICABLE –
• EMP relation – empno, deptno, salary, commission – commission only applies to staff in sales dept
– VALUE UNKNOWN –
• Joe’s salary is NULL, Joe’s salary is currently unknown
– VALUE DOES NOT EXIST –
• Tax File Number – is applicable to all employees but Joe may not
have a number at this time
– VALUE UNDEFINED –
• Certain items explicitly undefined eg. divide by zero
– Columns Number_of_payments, Total_payments – ColumnAverage_payment_made
– If Number_of_payments = 0 => Average undefined
16

Writing Relations
▪ Relations may be represented using the following notation:
– RELATION_NAME (attribute1, attribute2,…)
▪ The primary key is underlined.
▪ Example:
– STAFF (staffid, surname, initials, address, phone)
17

Relational Database
▪ A relational database is a collection of normalised relations.
▪ Normalisation is part of the design phase of the database and will be discussed in a later lecture.
Example relational database:
ORDER (order_id, orderdate,)
ORDER_LINE (order_id, product_id, quantity) PRODUCT (product_id, description, unit_price)
18

Foreign Key (FK)
▪ FK: An attribute/s in a relation that exists in the same, or another relation as a Primary Key.
▪ Referential Integrity
– A Foreign Key value must either match the full
primary key in a relation or be NULL.
▪ The pairing of PK and FK creates relationships (logical connections) between tables when implemented in a RDBMS. Hence the abstraction away from the underlying storage model.
19

MANAGER PROJECT
Q5. If the above two tables are to be created in a relational database, in which table would you assign the FK (and using which attribute) to create the logical link? For our supplied scenario a manager may manage many projects and a project can only be managed by one manager. A manager’s name (project_manager) and the project_code may be considered to be unique for this example:
MANAGER table using project_manager attribute. PROJECT table using project_code attribute. MANAGER table using manager_phone attribute. PROJECT table using project_manager attribute None of the above, a relationship is not needed.
20

Q6. Where are the foreign keys in these two relations? Note: a supervisor is a staff member
STAFF (staff_id, surname, initials, address, phone, dept_id, supervisor_id)
DEPARTMENT (dept_id, deptname)
dept_id in staff relation.
dept_id in department relation. staff_id in staff relation. supervisor_id in staff relation. More than one answer is correct
21

Data Integrity
▪ Entity integrity
– Primary key value must not be NULL.
• No duplicate tuple property then ensures that each primary key must be unique
▪ Referential integrity
– The values of FK must either match a value of a full
PK in the related relation or be NULL. ▪ Column/Domain integrity
– All values in a given column must come from the same domain (the same data type and range).
22

MANAGER
PROJECT
Q7. Suppose that the manager William K. Moor leaves the company and we delete his record from the manager table. Which of the following actions will satisfy the data integrity constraints?
The last row in PROJECT table must be deleted ✔ The PROJECT_MANAGER value in the last row of
PROJECT table must be set to NULL (empty) ✔ The PROJECT_MANAGER value in the last row of
PROJECT table must be set to any string (e.g., “XYZ”) X The options a and b
All of the above
23

Relational DMLs
▪ Relational Calculus
▪ Relational Algebra
▪ Transform Oriented Languages (e.g. SQL)
▪ Graphical Languages
▪ Exhibit the “closure” property – queries on relations produce relations
24

Relational Calculus
▪ Based on mathematical logic.
▪ Non-procedural.
▪ Primarily of theoretical importance.
▪ May be used as a yardstick for measuring the power of other relational languages (“relational completeness”).
▪ Operators may be applied to any number of relations.
25

26

Relational Algebra
▪ Relationally complete.
▪ Procedural.
▪ Operators only apply to at most two relations at a time. ▪ 8 basic operations:
– single relation: selection, projection
– cartesian product, join
– union
– intersection
– difference
– division
27

28

29

2
1
30

SQL vs Relational Algebra in the Database
31

Q8. Which of the following statements returns the student ids of the students who got more than 70 marks in the subject
1011.
A. σ mark > 70 (π stude (MARK))
B. σ mark > 70 (MARK)
C. σ mark > 70 AND subj = 1011 (π stude (MARK))
D. σ mark > 70 AND subj = 1011 (MARK)
E. π stude (σ mark > 70 AND subj = 1011 (MARK))
32

JOIN
▪ Join operator used to combine data from two or more relations, based on a common attribute or attributes.
▪ Different types: – theta-join
– equi-join
– natural join
– outer join
33

THETA JOIN (Generalised join)
(Relation_1) ⨝F (Relation_2)
– F is a predicate (i.e. truth-valued function) which is
of the form Relation_1.ai θ Relation2.bi
• CUSTOMER.cust_no θ ORDER.cust_no
– θ is one of the standard arithmetic comparison operators, i.e. <, ≤, =, ≥, >
– Most commonly, θ is equals (=), but can be any of the operators
• EMPLOYEE.emp_sal > SALARYSCALE.step_5
34

Q9. How many rows are generated when the product (Cartesian Product) of the STUDENT and MARK relations is taken? i.e. the number of rows in STUDENT X MARK.
24
6
18
7
none of the above
35

Q10. How many columns are generated when the product (Cartesian Product) of the STUDENT and MARK relations is taken? i.e. the number of columns in STUDENT X MARK.
9
6
5
7
none of the above
36

NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =)
1 1045 90
STUDENT. Name ID
1 Alice 1 Alice
1 Alice
2 Bob
2 Bob 2 Bob
MARK.ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
1 1004 95
2 1045 55
1 1045 90
37

NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1Alice 1100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
STUDENT.I Name D
MARK.ID Subj Marks
1 1004 95 1 1045 90
1 Alice
1 Alice
2 Bob 2 1045 55
38

NATURAL JOIN
STUDENT MARK
ID Name ID Subj Marks 1 Alice ⨝1 100495 2Bob 2104555
Step 1: STUDENT X MARK
Step 2: delete rows where IDs do not match (select =) Step 3: delete duplicate columns (project away)
1
1045 90
Marks 95
90
55
ID 1 1 2
Name Subj Alice 1004 Alice 1045 Bob 1045
A natural join of STUDENT and MARK
39

Q11. Which of the following statements returns a natural join of the two relations on the student ids (sid and stude)?
σ sid = stude (STUDENT X MARK)
π course, name, sid, subj, mark (σ sid = stude (STUDENT X MARK))
σ sid = stude (π course, name, sid, subj, mark (STUDENT X MARK)) All of the above
None of the above
40

OUTER JOIN
MARK
ID Subj Marks
1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT
ID Name
1 Alice
2 Bob
3 Chris

No information for Chris (no mark, e.g. just enrolled) and the student with ID 4 (no student, e.g. quit uni)
ID 1 1 2
Name Subj Marks Alice 1004 95 Alice 1045 90 Bob 1045 55
A natural join of STUDENT and MARK
41

FULL OUTER JOIN
STUDENT
ID 1 2 3
Name Alice Bob Chris

MARK
ID Subj Marks 1 1004 95
2 1045 55
1 1045 90
4 1004 100
Get (incomplete) information of both Chris and student with ID 4
ID Name 1 Alice
1 Alice
2 Bob
3 Chris
4
Subj Marks 1004 95 1045 90 1045 55
1004 100
Null Null
Null
A full outer join of STUDENT and MARK
42

LEFT OUTER JOIN
MARK
ID Subj Marks 1 1004 95
2 1045 55
1 1045 90
4 1004 100
STUDENT
ID Name 1 Alice 2 Bob
3 Chris

← Get (incomplete) information of only Chris
ID Name 1 Alice 1 Alice 2 Bob
3 Chris
Subj Marks 1004 95 1045 90 1045 55
Null Null
A left outer join of STUDENT and MARK
Memory aid: Chris is on the ← LEFT of the nulls.
43

RIGHT OUTER JOIN
MARK
ID Subj Marks
1 1004 95 ⟖2 104555 1 1045 90
4 1004 100
STUDENT
ID Name 1 Alice 2 Bob
3 Chris
Get (incomplete) information of the student with ID 4 →
ID 1 1 2 4
Name Alice Alice Bob
Subj Marks 1004 95 1045 90 1045 55 1004 100
Null
A right outer join of STUDENT and MARK.
Memory aid: the marks data is on the RIGHT → of the null.
44

45
Q12. Consider the above relations.
Assume that we want to join the them to obtain the information of all students (Anne, Dave, John and Tim). Which of the following is WRONG? (Hint: Dave just enrolled!)
A. Left outer join on STUDENT and MARK
B. Right outer join on MARK and STUDENT
C. Right outer join on STUDENT and MARK
D. Full outer join on STUDENT and MARK
E. Select if (B and C are wrong)
F. Select if (B, C and D are wrong)