CS计算机代考程序代写 SQL python data science database file system interpreter 1

1

CMPSC-132: Programming and Computation II

Department of Computer Science & Engineering
The Pennsylvania State University

1. Intro to Database Programming
File systems in your computer keep your data are kept in big unstructured named clumps called files.

But file systems seriously lack some of the critical features necessary for managing data such as transaction

support and fast indexing. A database is a collection of data, which is organized into files called tables.

These tables provide a systematic way of accessing, managing, and updating data. Databases allow

indexing based on any attribute or data property, and this helps with fast retrieval of data based on the

indexed attribute. Each value stored in a database is called a record. Records with similar structure are

grouped into tables. Records are retrieved and transformed using queries, which are statements in a query

language. Nowadays, one to the most used query languages is called SQL (Structured Query Language).

SQL is an example of a declarative programming language, this means, statements do not describe

computations directly (unlike Python, where we directly describe the computational processes), but instead

they describe the desired result of some computation, for example ‘Give me all the names of the files

created in this network after 8 am’. It is the role of the query interpreter of the database system to design

and perform a computational process to produce such result.

1.1 Database and Database Management Systems (DBMS)

A database is a collection of related data that can be processed to produce information. For example,

Canvas Gradebook data can be used to obtain the class average, lowest and highest scores, etc. A database

management system (DBMS for the rest of the notes) is a software package designed to define, manipulate,

retrieve, and manage data in a database. DBMS were developed to handle the difficulties of typical file-

processing systems supported by conventional operating systems:

• Data redundancy and inconsistency

• Difficulty in accessing data

• Data isolation – multiple files and formats

• Integrity problems

• Atomicity of updates

• Concurrent access by multiple users – data in parallel

• Security problems

DBMS are equipped with a query language, which makes it more efficient to retrieve and manipulate data.

A user can apply as many and as different filtering options as required to retrieve a set of data. All of the

major DBMS include features that ensure that data maintains consistent throughout software and hardware

crashes, as well as any failed transactions, adhering to the ACID principles:

• Atomicity: Guarantees that either all of the transaction succeeds or none of it does. If one part of

the transaction fails, the whole transaction fails, in other words, it’s either “all or nothing”.

• Consistency: Guarantees that all data will be consistent. All data will be valid according to all

defined rules, including any constraints, cascades, and triggers that have been applied on the

database.

2

• Isolation: Guarantees that all transactions will occur in isolation. No transaction will be affected by

any other transaction, this means, a transaction cannot read data from any other transaction that has

not yet completed.

• Durability: Once a transaction is committed, it will remain in the system – even if there’s a system

crash immediately following the transaction. Any changes from the transaction must be stored

permanently. If the system tells the user that the transaction has succeeded, the transaction must

have, in fact, succeeded.

1.1.1 The Relational Model

In this course, we only provide a brief introduction to the relational model. The central data

description construct in this model is a relation, which can be thought of as a set of records. A description

of data in terms of a data model is called a schema. In the relational model, the schema for a relation

specifies its name, the name of each field (or attribute or column), and the type of each field, so it is based

on first-order predicate logic and defines a table as an n-ary relation. The main highlights of this model are:

• Data is stored in tables called relations.

• Relations can be normalized.

o In normalized relations, values saved are atomic values.

• Each row in a relation contains a unique value.

• Each column in a relation contains values from a same domain.

For example, student information in a university database may be stored in a relation with the following

schema:

Students (sid: string, name: string, email: string, gpa: real, major: string)

The preceding schema says that each record in the Students relation has five fields, with field names and

types as indicated. The relation could look like:

sid name email gpa major

12345 Jane Doe jdx589 3.8 PREMJR

67890 John Doe jdx741 3.4 CS

24679 Tyler Smith tsc781 1.8 EE

13578 Liz Taylor lty663 3.2 ME

24895 James Jones jjt442 2.7 EE

Each row in the Students relation is a record that describes a student. We can make the description of a

collection of students more precise by specifying integrity constraints, which are conditions that the records

in a relation must satisfy. For example, we could specify that every student has a unique sid value. Note

that we cannot capture this information by simply adding another field to the Students schema. Thus, the

ability to specify uniqueness of the values in a field increases the accuracy with which we can describe our

data. The expressiveness of the constructs available for specifying integrity constraints is an important

aspect of the relational data model.

3

1.2 Structured Query Language (SQL)

SQL is a computer language for storing, manipulating and retrieving data stored in relational

database and it is the standard language for most DBMS like MySQL, MS Access, Oracle, PostgreSQL

and SQL Server. Although the SQL language is standardized, most DBMS implement some custom variant

of the language that is endowed with proprietary features. In this document, we will describe SQL statement

in their general form.

1.2.1 Keys

Recall that a table, also called a relation, has a fixed number of named and typed columns. Each

row of a table represents a data record and has one value for each column. A table typically has a column

or combination of columns that contain values that uniquely identify each row in the table. This column,

or columns, is called the primary key (PK) of the table and enforces the entity integrity of the table. If a

primary key constraint is defined on more than one column, values may be duplicated within one column,

but each combination of values from all the columns in the primary key constraint definition must be

unique.

In this example, sid is the perfect candidate for primary key since it can uniquely identify each

student in the table:

sid (PK) name email gpa major

12345 Jane Doe jdx589 3.8 PREMJR

67890 John Doe jdx741 3.4 CS

24679 Tyler Smith tsc781 1.8 EE

13578 Liz Taylor lty663 3.2 ME

24895 James Jones jjt442 2.7 EE

A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link

between the data in two tables to control the data that can be stored in the foreign key table. In a foreign

key reference, a link is created between two tables when the column or columns that hold the primary key

value for one table are referenced by the column or columns in another table. This column becomes a

foreign key in the second table.

sid (PK) name email gpa major(FK) majorid(PK) descr

12345 Jane Doe jdx589 3.8 PREMJR PREMJR Pre-major

67890 John Doe jdx741 3.4 CS CS Computer Science

24679 Tyler Smith tsc781 1.8 EE EE Electrical Engineering

13578 Liz Taylor lty663 3.2 ME DS Data Science

24895 James Jones jjt442 2.7 EE CENG Computer Engineering

57841 Lulu Sam lsw702 1.5 PREMJR

44895 Alex Dar adc610 3.9 CS

Student Table Major Table

4

For example, the Student table has a foreign key link (major) to the Major table (majorid) because there is

a logical relationship between student’s major and the majors available at the university. The major column

in the Student table matches the primary key column of the Major table. By creating this foreign key

relationship, a value for major cannot be inserted into the Student table if it does not already exist in the

Major table.

1.2.2 Basic DDL commands

Data Description Language (DDL) is a syntax for creating and modifying database objects such as

tables, indexes, and users. Common examples of DDL statements include CREATE, ALTER, and DROP.

1.2.2.1 CREATE

The CREATE statement creates a new table, a view of a table, or other object in database. For each

column, we can optionally include the UNIQUE keyword, which indicates that the column can only contain

unique values, or the NOT NULL keyword, which indicates that the column should not contain NULL

values. The general for creating a new table looks like this:

CREATE TABLE TableName(Column1 datatype

Column2 datatype,

…,

PRIMARY KEY(column(s)))

Including the optional if not exists clause will prevent an error if we attempt to create duplicate tables.

CREATE TABLE IF NOT EXISTS TableName(Column1 datatype

Column2 datatype,

…,

PRIMARY KEY(column(s)))

The SQL statement for creating the Student and Major tables can be written as follows:

CREATE TABLE Student(sid INT NOT NULL,

name VARCHAR(30) NOT NULL,

email VARCHAR(20) NOT NULL,

gpa DECIMAL(3,2) NOT NULL,

major VARCHAR(10) NOT NULL,

PRIMARY KEY(sid),

FOREIGN KEY(major) REFERENCES(Major,majorid))

CREATE TABLE Major(majorid INT NOT NULL,

descr VARCHAR(30) NOT NULL,

PRIMARY KEY(majorid))

5

1.2.2.2 DROP

The DROP table statement deletes a table from a database. Including the optional if exists clause will

prevent an error if we attempt to drop a non-existing table. All data, indexes, triggers, constraints, and

permission specifications for that table will also be deleted. In its general format:

DROP TABLE IF NOT EXISTS TableName

We have to be careful when using this command because once a table is deleted then all the information

available in the table would also be lost forever. The SQL statement for deleting the Student table can be

written as follows:

DROP TABLE Student

1.2.3 Basic DML commands

Data Manipulation Language (DML) is a syntax used for adding (inserting), deleting, and

modifying (updating) data in a database. In SQL, DML statements include SELECT, INSERT, UPDATE

and DELETE.

1.2.3.1 SELECT

The SELECT statement is used to fetch the data from a database table which returns data in the

form of a result table. In its general form:

SELECT Column1, Column2

FROM Table Name

The columns of the resulting table are described by a comma-separated list of expressions that are each

evaluated for each row of the existing input table. If we want to fetch all the columns available in the table,

then we can use the following syntax:

SELECT *

FROM Table Name

For example, from the Student table, we can create a two-column table that describes each student by GPA

SELECT name, gpa

FROM Student

name gpa

Jane Doe 3.8

John Doe 3.4

Tyler Smith 1.8

Liz Taylor 3.2

James Jones 2.7

Lulu Sam 1.5

Alex Dar 3.9

6

The SELECT statement has many optional clauses:

▪ WHERE specifies which rows to retrieve.

▪ GROUP BY groups rows sharing a property so that an aggregate function can be applied to each

group.

▪ HAVING selects among the groups defined by the GROUP BY clause.

▪ ORDER BY specifies how to order the returned rows.

▪ AS provides an alias which can be used to temporarily rename tables or columns.

WHERE Clause

A SELECT statement can also include a where clause with a filtering expression. This expression

filters the rows that are projected. Only a row for which the filtering expression evaluates to a true value

will be used to produce a row in the resulting table. In its general form:

SELECT Column1, Column2

FROM Table Name

WHERE Criteria

For example, from the Student table, retrieve student with GPA > 3

SELECT name, gpa

FROM Student

WHERE gpa > 3

For example, from the Student table, retrieve student with GPA > 3

SELECT name, gpa

FROM Student

WHERE gpa > 3 AND

major = ‘CS’

SELECT Student.sid AS PSU_ID, Student.gpa, Major.descr

FROM Student, Major

WHERE gpa <= 3 AND major = 'PREMJR' AND Student.major = Major.majorid name gpa Jane Doe 3.8 John Doe 3.4 Liz Taylor 3.2 Alex Dar 3.9 name gpa John Doe 3.4 Alex Dar 3.9 PSU_ID gpa descr 57841 1.5 Pre-major 7 GROUP BY Clause and Aggregated Functions The SELECT statement joins and manipulates individual rows. However, it is possible to perform aggregation operations over multiple rows. The aggregate functions MAX, MIN, COUNT, SUM and AVG return the maximum, minimum, number, sum and average of the values in a column. Multiple aggregate functions can be applied to the same set of rows by defining more than one column. Only columns that are included by the where clause are considered in the aggregation. SELECT major, gpa FROM Student WHERE gpa>1

SELECT AVG(gpa) AS SchoolAVG
FROM Student
WHERE gpa>1

SELECT major, AVG(gpa) AS MajorAVG
FROM Student
WHERE gpa>1
GROUP BY major

Notice that the second SELECT statement has produced a table with a single row, the average of the entire

data in the column gpa. The GROUP BY clause of a SELECT statement is used to partition rows into

groups and select only a subset of the groups. Any aggregate functions in the column description will apply

to each group independently, rather than the entire set of rows in the table as shown in the third example,

where we group together the GPA scores by major and then compute the average in each group separately.

The combination of these additional clauses allows a SELECT statement to express a wide range of

projections of an input table into a related output table.

1.2.3.2 INSERT

The INSERT statement allows us to add rows to a table in a database. In particular, we can insert

values into all columns of our table, or we can add to one specific column, which will set the other columns

to their default values (if any). INSERT statements have the following form:

INSERT INTO TableName (column1, column2, …) VALUES (value1, value2, …)

8

The SQL statements for inserting rows into the Student can be written as follows:

INSERT INTO Student(sid, name, email, gpa, major) VALUES (12345,’Jane Doe’,’jdx589′,3.8,’PREMJR’)
INSERT INTO Student(sid, name, email, gpa, major) VALUES (67890,’John Doe’,’jdx741′,3.4,’CS’)

INSERT INTO Student(sid, name, email, gpa, major) VALUES

(24679,’Tyler Smith’,’tsc781′,1.8,’EE’),

(13578,’Liz Taylor’,’lty663′,3.2,’ME’)

1.2.3.3 UPDATE

The UPDATE statement sets all entries in certain columns of a table to new values for a subset of

rows as indicated by an optional WHERE clause. We can update all rows by omitting the optional where

clause. The UPDATE statement has the following form:

UPDATE Table Name

SET column1 = value, column2 = value , …

WHERE condition

For the UPDATE to be successful, the updated value must not conflict with all the applicable constraints

such as primary keys, unique indexes, CHECK constraints, and NOT NULL constraints.

UPDATE Student
SET major=’CS’
WHERE gpa>2.5

9

1.2.3.4 DELETE

The DELETE statement deletes a subset of rows of a table as indicated by an optional where clause.

If we do not include a where clause, then we will delete all rows, but an empty table would remain in our

database. ). DELETE statements have the following form:

DELETE FROM TableName

WHERE condition

UPDATE Student
SET major=’EE’
WHERE gpa>3 AND
name LIKE ‘J%’

UPDATE Student
SET major=’PREMJR’

10

DELETE FROM Student
WHERE gpa<2 DELETE FROM Student WHERE sid=44895 DELETE FROM Student 11 Appendix: List of Frequently SQL Commands Reference: https://www.codecademy.com/articles/sql-commands ALTER TABLE lets you add columns to a table in a database. ALTER TABLE table_name ADD column_name datatype; AND is an operator that combines two conditions. Both conditions must be true for the row to be included in the result set. SELECT column_name(s) FROM table_name WHERE column_1 = value_1 AND column_2 = value_2; AS is a keyword in SQL that allows you to rename a column or table using an alias. SELECT column_name AS 'Alias' FROM table_name; AVG() is an aggregate function that returns the average value for a numeric column. SELECT AVG(column_name) FROM table_name; The BETWEEN operator is used to filter the result set within a certain range. The values can be numbers, text or dates. SELECT column_name(s) FROM table_name WHERE column_name BETWEEN value_1 AND value_2; CASE statements are used to create different outputs (usually in the SELECT statement). It is SQL’s way of handling if-then logic. SELECT column_name, CASE WHEN condition THEN 'Result_1' WHEN condition THEN 'Result_2' ELSE 'Result_3' END FROM table_name; https://www.codecademy.com/articles/sql-commands 12 COUNT() is a function that takes the name of a column as an argument and counts the number of rows where the column is not NULL. SELECT COUNT(column_name) FROM table_name; CREATE TABLE creates a new table in the database. It allows you to specify the name of the table and the name of each column in the table. CREATE TABLE table_name ( column_1 datatype, column_2 datatype, column_3 datatype ); DELETE statements are used to remove rows from a table. DELETE FROM table_name WHERE some_column = some_value; GROUP BY is a clause in SQL that is only used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups. SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name; HAVING was added to SQL because the WHERE keyword could not be used with aggregate functions. SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name HAVING COUNT(*) > value;

An inner JOIN will combine rows from different tables if the join condition is true.

SELECT column_name(s)
FROM table_1
JOIN table_2
ON table_1.column_name = table_2.column_name;

13

INSERT statements are used to add a new row to a table.

INSERT INTO table_name (column_1, column_2, column_3)
VALUES (value_1, ‘value_2’, value_3);

IS NULL and IS NOT NULL are operators used with the WHERE clause to test for empty values.

SELECT column_name(s)
FROM table_name
WHERE column_name IS NULL;

LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column.

SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern;

LIMIT is a clause that lets you specify the maximum number of rows the result set will have.

SELECT column_name(s)
FROM table_name
LIMIT number;

MAX() is a function that takes the name of a column as an argument and returns the largest

value in that column.

SELECT MAX(column_name)
FROM table_name;

MIN() is a function that takes the name of a column as an argument and returns the smallest

value in that column.

SELECT MIN(column_name)
FROM table_name;

OR is an operator that filters the result set to only include rows where either condition is true.

SELECT column_name
FROM table_name
WHERE column_name = value_1
OR column_name = value_2;

14

ORDER BY is a clause that indicates you want to sort the result set by a particular column either

alphabetically or numerically.

SELECT column_name
FROM table_name
ORDER BY column_name ASC | DESC;

An outer join will combine rows from different tables even if the join condition is not met. Every

row in the left table is returned in the result set, and if the join condition is not met,

then NULL values are used to fill in the columns from the right table.

SELECT column_name(s)
FROM table_1
LEFT JOIN table_2
ON table_1.column_name = table_2.column_name;

ROUND() is a function that takes a column name and an integer as arguments. It rounds the

values in the column to the number of decimal places specified by the integer.

SELECT ROUND(column_name, integer)
FROM table_name;

SELECT statements are used to fetch data from a database. Every query will begin with SELECT.

SELECT column_name
FROM table_name;

SELECT DISTINCT specifies that the statement is going to be a query that returns unique values in

the specified column(s).

SELECT DISTINCT column_name
FROM table_name;

SUM() is a function that takes the name of a column as an argument and returns the sum of all

the values in that column.

SELECT SUM(column_name)
FROM table_name;

UPDATE statements allow you to edit rows in a table.

UPDATE table_name
SET some_column = some_value
WHERE some_column = some_value;

15

WHERE is a clause that indicates you want to filter the result set to include only rows where the

following condition is true.

SELECT column_name(s)
FROM table_name
WHERE column_name operator value;

WITH clause lets you store the result of a query in a temporary table using an alias. You can also

define multiple temporary tables using a comma and with one instance of the WITH keyword.

WITH temporary_name AS (
SELECT *
FROM table_name)
SELECT *
FROM temporary_name
WHERE column_name operator value;