CS计算机代考程序代写 SQL database File Structures

File Structures

ADVANCED SQL & ANALYTICS

Database Theory & Applications (M)

Dr Chris Anagnostopoulos

ROADMAP

 Join Query

 Dealing with NULL FKs

 Analytics Query

 Complex Un/Correlated Query using Aggregation Functions

over Groups of Tuples;

 Objective: Extract knowledge from tuples and not just

retrieving tuples…

 Modification Query

2

INNER JOIN

 INNER JOIN matches tuples using FK and PK (THETA-JOIN).

 The matching operator is, usually, the equality ‘ = ’, thus, it is

referred to as EQUIJOIN: R1.PK = R2.FK

Query 0: Show the employees who are working in the department

‘Research’.

3

INNER JOIN

SELECT Fname, Lname, Address

FROM EMPLOYEE, DEPARTMENT

WHERE Dname = ‘Research’ %selection condition

AND DNO = DNUMBER; %equi-join condition

Note: Join and selection conditions are both in the WHERE clause

4

INNER AND OUTER JOINS

 INNER JOIN (versus OUTER JOIN)

 A tuple is retrieved if and only if there exists a matching tuple;

 i.e., FK is not NULL.

 LEFT OUTER JOIN (LR LEFT OUTER JOIN RR)

 Every tuple in the left relation LR must appear in result

 If no matching tuple exists, just add NULL values for attributes of

right relation RR

 RIGHT OUTER JOIN (LR RIGHT OUTER JOIN RR)

 Every tuple in the right relation RR must appear in result

 If no matching tuple exists, just add NULL values for attributes of left

relation LR

5

LEFT OUTER JOIN

Query 1: Show the last name of an employee and the last name of their

supervisor, if there exists!

SELECT E.Lname, S.Lname

FROM (EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE AS S

ON E.Super_SSN = S.SSN)

SELECT E.Lname, S.Lname

FROM EMPLOYEE AS E, EMPLOYEE AS S

WHERE E.Super_SSN = S.SSN

6

E.Lname S.Lname

Smith Wong

Borg NULL

Franklin Jennifer

IN-CLASS QUIZ

SELECT E.Fname, E.Minit, E.Lname, D.Dname

FROM EMPLOYEE AS E LEFT OUTER JOIN DEPARTMENT AS D

ON E.SSN = D.MGR_SSN

What are we expecting? What knowledge do we extract?

7

AGGREGATE FUNCTION

Aggregation function: statistical summary/value over group of tuples.

 Built-in aggregation functions over attribute X:

 COUNT( * ): How many employees are working in Dept 5?

 SUM ( X ): Sum up all salaries of employees in Dept 5.

 MAX( X ) / MIN ( X ): Who is the youngest employee in Dept. 5?

 AVG( X ): Average salary of employees in Dept. 5

 CORR(X, Y): Correlation between Age and Salary of employees in
Dept. 5

 …

Note: NULL values are discarded apart from COUNT(*).

Note: Define our own function,

 e.g., calculate the GPA as a weighted sum of grades;

 e.g., calculate the Euclidean distance between two geo-locations in Location-
base Services applications…

8

AGGREGATE FUNCTION

Query 2: Show the maximum, minimum and average salary of those

employees who work in Dept. 5.

SELECT MAX (Salary) AS Highest_Sal,

MIN (Salary) AS Lowest_Sal,

AVG (Salary) AS Average_Sal

FROM EMPLOYEE

WHERE DNO = 5;

9

ANALYTICS: GROUPING TUPLES

 Partition a relation into groups based on grouping attribute, i.e.,

clustering tuples having the same value in the grouping attribute.

GROUP BY {grouping attribute}

Which is the grouping attribute and the expected number of groups?

 1: Group of employees with the same last-name

 2: Group of employees working in the same department

 3: Group of dependents of the same employee

 4: Group of employees with the same salary…

 Then, we apply aggregation functions to each group. 10

ANALYTICS: GROUPING TUPLES

aggregation function f, e.g., COUNT(*)

3

3

4

11

#tuples (whole) #tuples per group

IN-CLASS EXAMPLE: GROUP BY

Query 3: Show the number of employees per department & average salary

per department.

SELECT DNO, COUNT (*), AVG (Salary)

FROM EMPLOYEE

GROUP BY DNO;

Step 1: Partition EMPLOYEE into separate groups w.r.t. department.

Step 2: For each group, calculate its cardinality and average salary.

Note 1: The grouping attribute must appear in SELECT

Note 2: If the grouping attribute has NULL values, then a separate group is

created for the NULL value.
12

13

Actual Analytics Query is: Which is the most populated department?

IN-CLASS EXAMPLE: HISTOGRAM

Query 4: Show the number of

employees per age.

SELECT E.AGE, COUNT (*)

FROM EMPLOYEE AS E

GROUP BY E.AGE;

Step 1: Partition EMPLOYEE w.r.t. age.

Step 2: For each group, calculate its

cardinality.

Note: Use this analytics to approximate the

histogram of AGE, i.e., how the AGE is

distributed over the tuples…
14

#
E

m
p

lo
y
e
e
s

IN-CLASS EXAMPLE: REGRESSION ANALYTICS

Query 5: How is the average salary of

employees distributed along the age?

SELECT E.AGE, AVG(E.Salary)

FROM EMPLOYEE AS E

GROUP BY E.AGE;

Note: Use this analytics to approximate the

dependency of Salary on AGE.

Note: Predictive Analytics: given a age, predict

the expected salary

15

AGE

A
V

G
(S

a
la

r
y
)

IN-CLASS EXAMPLE

Query 6: How many employees are working in each project? Include the

project name at the results.

SELECT P.PNAME, COUNT (*) //employees per project

FROM PROJECT AS P, WORKS_ON AS W

WHERE P.PNUMBER = W.PNO

GROUP BY P.PNAME;

Step 1: Associate a PROJECT with the WORKS_ON

Step 2: Group associated tuples together w.r.t. PNAME

Step 3: For each group, count the number of tuples (employees)

16

17

2 members

3 members

2 members

3 members

3 members

3 members

PNAME COUNT(*)

Product X 2

Product Y 3

Product Z 2

[A1] WORKED EXAMPLE

Task 1: Which is the average salary of employees per department? Include

the department name at the results.

18

SSN Salary DNO

1 £30K 1

2 £60K 1

3 £20K 2

4 $25K 3

DNUMBER DNAME

1 D1

2 D2

3 D3

EMPLOYEE DEPARTMENT

DNAME AVG(Salary)

D1 £45K

D2 £20K

D3 £25K

[A1] WORKED EXAMPLE

Task 1: Which is the average salary of employees per department? Include

the department name at the results.

SELECT D.DNAME, AVG (E.SALARY) //avg salary/dept

FROM DEPARTMENT AS D, EMPLOYEE AS E

WHERE D.DNUMBER = E.DNO

GROUP BY D.DNAME;

Step 1: Associate EMPLOYEE with DEPARTMENT

Step 2: Group associated tuples together w.r.t. DNAME

Step 3: For each group, take the average of salaries

19

GROUP BY & HAVING

HAVING: condition to select/reject a group after grouping!

Query 7: Show the number of employees per project only from those projects
with more than 2 employees. Include the project name in the results.

SELECT P.PNAME, COUNT (*)

FROM PROJECT AS P, WORKS_ON AS W

WHERE P.PNUMBER = W.PNO

GROUP BY P.PNAME

HAVING COUNT(*) > 2
20

ESSN PNO

1 P1

2 P1

3 P2

4 P1

WORKS_ON

PName PNO

GLA P1

EDI P2

PROJECT

PName COUNT(*)

GLA 3

3

1

2

4

o
r
d

e
r

IN-CLASS EXAMPLE

Task: Who are the employees (SSN, last name) with more than two

dependents.

21

EMPLOYEE

DEPENDENT

IN-CLASS EXAMPLE

Task: Who are the employees (SSN, last name) with more than two

dependents.

Idea: Group dependents of the same employee; count.

SELECT E.SSN, E.LNAME, COUNT (*)

FROM DEPENDENT AS D, EMPLOYEE AS E

WHERE E.SSN = D.ESSN

GROUP BY E.SSN

HAVING COUNT(*) > 2 22

[A2] WORKED EXAMPLE

Task: Who are the managers (last names) of those departments

with more than 100 employees?

23

EMPLOYEE

DEPARTMENT

[A2] WORKED EXAMPLE

Task: Which are the managers (last names) of those departments

with more than 100 employees?

SELECT M.LNAME

FROM EMPLOYEE M, DEPARTMENT P

WHERE M.SSN = P.MGR_SSN

AND P.DNUMBER IN (

SELECT E.DNO

FROM EMPLOYEE AS E

GROUP BY E.DNO

HAVING COUNT(*) > 100);

24

TRICKY ANALYTICS QUERY

Query 8: For each department with more than 5 employees, tell
me how many of them are making more than £40K.

SELECT DNO, COUNT (*)

FROM EMPLOYEE

WHERE Salary > 40000

GROUP BY DNO

HAVING COUNT (*) > 5;

Step 1: Identify the departments with more than 5 employees.

Step 2: Check if employees of those departments earn more
than £40K; count

But: WHERE filters out employees with Salary <= £40K before grouping…thus, the group sizes (employees per department) are not correct… 25 26 ‘Department 1’ ‘Department 2’ ‘Department 3’ Legend Employee with more than £40K Employee with less than £40K TRICKY ANALYTICS QUERY ‘Department 1’ ‘Department 2’ ‘Department 3’ SELECT DNO, COUNT (*) FROM EMPLOYEE WHERE Salary > 40000

GROUP BY DNO

HAVING COUNT (*) > 5;

‘Department 4’

‘Department 4’

Correct: we want to count the total number of employees whose

salaries exceed £40K in those departments with more than five

employees.

27

‘Department 1’

‘Department 2’

‘Department 3’

Legend

Employee with more than £40K

Employee with less than £40K

TRICKY ANALYTICS QUERY

‘Department 4’

DNO COUNT(*)

2 2

4 3

SELECT DNO, COUNT(*)

FROM EMPLOYEE

WHERE Salary > 40000 AND DNO IN

(SELECT A.DNO

FROM EMPLOYEE A

GROUP BY A.DNO

HAVING COUNT(*) > 5)

GROUP BY DNO

First, find the

departments with

more than 5

employees

Second, for each

department, check if

members earn more

than £40K.

28

Then, group

and count

the £40K-

employees

per

department

‘Department 1’

‘Department 2’

‘Department 3’

‘Department 4’

2 4

[A3] WORKED EXAMPLE

EMPLOYEE(SSN, …, DNO)

Task: Show the department(s) with the maximum number of

employees.

Note: It might be the case that more than one department

has the maximum number of employees.

29

[A3] WORKED EXAMPLE

EMPLOYEE(SSN, …, DNO)

Task: Show the department(s) with the maximum number of
employees. It might be the case that more than one department has the
maximum number of employees.

SELECT DNO, COUNT(*)

FROM EMPLOYEE

GROUP BY DNO

HAVING COUNT(*) = (SELECT MAX(A.members)

FROM

(SELECT D.DNO, COUNT(*) AS members

FROM EMPLOYEE D

GROUP BY D.DNO) AS A

);
30

SQL: INSERT

Key, integrity and referential constraints are automatically enforced!

31

INSERT INTO EMPLOYEE (SSN, FNAME, LNAME, …)

VALUES (‘1234567’, ‘Chris’, ‘McReader’, …)

SQL: DELETE

Get a relation and include a WHERE to specify the tuple(s) to be deleted:

 Note: A missing WHERE specifies that all tuples in the relation are to be

deleted; the table then becomes an empty table.

 Referential Integrity: tuples are deleted from only one table at a time unless

ON DELETE CASCADE is specified on a constraint!

32

SQL: UPDATE

Modify attribute values of tuples, which satisfy WHERE before modification!

Query 9: Change the location and controlling department number of project

number 10 to ‘Bellaire’ and 5, respectively.

UPDATE PROJECT

SET Plocation = ‘Bellaire’, Dnum = 5

WHERE Pnumber = 10

Referential Integrity: tuples are updated from only one table at a time

unless ON UPDATE CASCADE is specified on a constraint!
33

IN-CLASS EXAMPLE

Task: Give all employees in the department 5 a 10% raise in salary.

UPDATE EMPLOYEE

SET SALARY = SALARY *1.1

WHERE DNO = 5

Modified SALARY depends on the original SALARY in each tuple:

 SALARY on the right of = refers to the old salary before modification

 SALARY on the left of = refers to the new salary after modification

34

DROP (OPTIONAL)

 DROP is used to drop named schema elements, such as tables,

domains, or constraint

Example: DROP SCHEMA COMPANY CASCADE;

It removes the schema and all its elements including tables,

constraints, etc.

Example: DROP TABLE EMPLOYEE;

It drops the existing table EMPLOYEE and all of its tuples.

Example: DROP TABLE EMPLOYEE CASCADE CONSTRAINTS;

It drops the existing table EMPLOYEE, all of its tuples and drop

the FOREIGN KEY constraints of the tables referring to EMPLOYEE

(but not those tables). 35

ALTER (OPTIONAL)

 ALTER includes:

 Adding or dropping a column (attribute)

 Changing a column definition

 Adding or dropping table constraints

Example: ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);

Example: ALTER TABLE COMPANY.EMPLOYEE DROP CONSTRAINT EMPSUPERFK;

Example: ALTER TABLE COMPANY.DEPARTMENT ADD CONSTRAINT NEW_UNIQUE

UNIQUE (Dname);

Example: ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address;

Example: ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn DROP

DEFAULT;

Example: ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn SET

DEFAULT ‘333445555’;
36