CS W4111.001 Introduction to Databases Fall 2020
Computer Science Department Columbia University
1
Application Programming (material not included in exams)
2
SQL ≠ Programming Language
SQL is not a general purpose programming language
• SQL is tailored for data access and manipulation
• SQL queries are easy to optimize and parallelize
• SQL can’t perform “business logic” that’s often needed
Options:
• Extend existing programming languages to understand SQL natively
• Provide an API between programming languages and DBMS
3
• Fully embed into language: Embedded SQL
• Use low-level library with core database calls: DB API
• Object-Relational Mapping, ORM:
• Ruby on Rails, Django, Hibernate, SQLAlchemy, …
• Defines database-backed classes
• “Magically” maps between database rows and objects in programming language
• “Magic” is a double-edged sword; we will not use
in our class
4
Several Options
• Host programming language (e.g., Java,
C) is extended with special SQL
syntax/directives
Example: EXEC SQL sql-query
• Program goes through a preprocessor
• Finally, program is compiled into a program that interacts with the DBMS directly
5
Embedded SQL
Java + Embedded SQL
…
if (user == ‘admin’){
EXEC SQL SELECT * …
} else { …
Preprocessor
Java + DB library calls
Java Compiler
DBMS library
Executable
DBMS
6
Embedded SQL
Database API: Vendor-Specific or Standardized (ODBC, JDBC)
• Write a program using classes implemented by the DBMS vendor, which implement a standard set of database interfaces
• Pass SQL statements as arguments to functions
• Process SQL statements at runtime, and send to DBMS via a driver provided by the DBMS
7
• Provide single interface to possibly multiple DBMS engines
• Connect to a database
• Manage transactions
• Map objects between host language and DBMS
• Manage query results
8
What Does a Library Need to Do?
• Library components
• Impedance mismatches • Types
• Result sets
9
Library/API Overview
https://docs.sqlalchemy.org/en/13/core/engines.html
10
“Engines”
Abstraction for a database engine; attempts to hide DBMS language differences
driver://username:password@host:port/database
from sqlalchemy import create_engine
db1 = create_engine(“postgresql://localhost:5432/testdb”)
db2 = create_engine(“sqlite:///testdb.db”) // note: sqllite has no host name (sqlite:///)
conn1 = db1.connect() conn2 = db2.connect()
Should close connections when done!
11
“Connections”
Before running queries, need to create a connection with database
• Tells DBMS to allocate resources for connection
• Are relatively expensive to set up, so libraries often cache connections for future use
• Defines scope of a transaction
conn1.execute(“UPDATE TABLE test SET a = 1”) conn1.execute(“UPDATE TABLE test SET s = ‘smith’”)
12
Query Execution
foo = conn1.execute(“SELECT * FROM big_table”)
Challenges:
• What is the return type of execute()? • Type impedance
• How to pass data between DBMS and host language?
13
Query Execution
• SQL standard defines mappings between SQL and several languages
• Most libraries can handle common types
SQL types CHAR(20) INTEGER SMALLINT REAL
C types char[20] int
short float
Python types str
int
int
float
What about complex objects (e.g., { x: ‘1’, y: ‘hello’ })?
14
(Type) Impedance Mismatch
Query Execution
• Pass only “sanitized” values to database
args = (‘Dr Seuss’, ‘40’) conn1.execute(
“INSERT INTO users(name, age) VALUES(%s, %s)”, args)
• Pass in a tuple of query arguments
• DBAPI library will properly escape input values
• Most libraries support this
• Never construct raw SQL strings
15
(Results) Impedance Mismatch
• SQL relations and results are sets of records • What is the type of table?
table = execute(“SELECT * FROM big_table”)
• Cursor over result set, similar to an iterator interface
• To have ordering guarantees, use ORDER BY clause in queries
16
sailor1
sailor2
sailor3
sailor4
sailor5
sailor4
cursor
Program
DBMS
sailor6
sailor7
sailor8
sailor9
sailor10
17
sailor11
sailor1
sailor2
sailor3
sailor4
sailor5
sailor5
sailor6
cursor
sailor7
sailor8
sailor9
sailor10
Program
DBMS
18
sailor11
• Cursor similar to an iterator (next() calls)
cursor = execute(“SELECT * FROM bigtable”)
• Cursor attributes/methods (logical)
rowcount keys() previous() next()
19
(Results) Impedance Mismatch
• Cursor similar to an iterator (next() calls)
cursor = execute(“SELECT * FROM bigtable”)
cursor.rowcount() cursor.fetchone() for row in cursor:
print row
# 1000000 # (0, ‘foo’, …) # iterate over the rest
• Actual cursor methods vary depending on implementation
20
(Results) Impedance Mismatch
• DBMS vendors provide libraries for most languages
• Two heavyweights in enterprise world
• ODBC: Open DataBaseConnectivity (Microsoft defined for Windows libraries)
• JDBC: Java DataBase Connectivity (Sun developed as set of Java interfaces); java.sql.*, javax.sql.*
21
Some Useful Names
• Impedance mismatch
• Different uses of a DBAPI
• Why Embedded SQL is no good • What good are cursors?
Will use SQLAlchemy with Python for Part 3 of Project 1 (for Web Front-End Option), but without ORM component
22
What to Understand