5AAVC210 Introduction to Programming WEEK 7
Pandas vs. SQLite
Last week we looked at Pandas. This week we’re doing SQLite.
Pandas is not a “datastore” in the way a relational database is.
Pandas is a Python library for manipulating data that will fit in memory.
Pandas is an in-memory data storage tool. This allows you to do very rapid calculations over large amounts of data very quickly.
SQLite persistently stores data and is a database.
Databases
Databases are nothing more than electronic lists of information.
A Database is a collection of data organized so that it can be managed and updated.
Different database tools store that information in unique ways:
Flat files use a table
SQL databases use a relational model
NoSQL databases use a key-value model
We are going to look at relational databases
https://medium.com/analytics-vidhya/programming-with-databases-in-python-using-sqlite-4cecbef51ab9
Relational databases
A relational database is a database which has a structure that can recognize the relations between data.
A relational database consists of related tables of information.
The basic structure is a set of tables organizing data into predefined categories.
Database tables
Each table in a relational database has one or more columns.
Each column is assigned a specific data type, such as an integer number, a sequence of characters (for text), or a date.
Each row in the table has a value for each column.
This column’s data type is text (known as char, short for characters)
This column’s data type is int
There is no significance to the order of the columns or rows.
Each row contains one (and only one) value for each column and each row is distinct.
Each value for a given column has the same type.
Each table in the database should hold information about a specific thing only, such as employees, products, or customers.
A relational database is a set of related tables. You use primary and foreign keys to describe relationships between the information in different tables.
These keys enable each row in the database tables to be identified and define the relationships between the tables.
Primary keys
The primary key of a table uniquely identifies each record in the table.
It is a column, or set of columns, that allows each row in the table to be uniquely identified.
No two rows in a table with a primary key can have the same primary key value.
Foreign key
A foreign key is a field in a relational table that matches the primary key column of another table.
Questions?
Database management systems
The Database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze data.
The DBMS used for Relational databases is called Relational Database Management Systems (RDBMS).
We will be using an RDBMS called SQLite.
SQLite and SQL
SQLite is a Relational Database Management System (RDBMS).
Several programming languages have built-in support for SQLite including Python.
The letters “SQL” come from “Structured Query Language”, which is the standard language to communicate with relational databases.
So SQLite is our database system in Python, and we can use SQL (the query language) to manage it.
What can Structured Query Language do?
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert records in a database
SQL can update records in a database
SQL can delete records from a database
SQL can create new databases
SQL can create new tables in a database
SQL can set permissions on tables, procedures, and views
There is a set of basic SQL commands that is considered standard and is used by all RDBMSs. For example, all RDBMSs use the SELECT statement.
A SELECT statement, also called a query, is used to get information from a table. It specifies one or more column headings, one or more tables from which to select, and some criteria for selection.
The RDBMS returns rows of the column entries that satisfy the stated requirements.
A SELECT statement such as the following will fetch the first and last names of employees (presuming there is a table called Employees, with the columns First_Name and Last_Name):
SELECT First_Name, Last_Name FROM Employees
Let’s try it out:
https://www.w3schools.com/sql/sql_syntax.asp
Questions?
The most useful SQL commands:
SELECT – extracts data from a database
UPDATE – updates data in a database
DELETE – deletes data from a database
INSERT INTO – inserts new data into a database
CREATE DATABASE – creates a new database
ALTER DATABASE – modifies a database
CREATE TABLE – creates a new table
ALTER TABLE – modifies a table
DROP TABLE – deletes a table
CREATE INDEX – creates an index (search key)
DROP INDEX – deletes an index
SQLite and Python
SQLite can be integrated with Python using a Python module called sqlite3.
You do not need to install this module separately because it comes bundled with Python.
To begin, you need to open a connection to an SQLite database file.
To establish a connection you pass a file path to the connect(…)method in the sqlite3 module
If the database represented by the file does not exists one will be created at that path.
Once you have a Connection, you can create a Cursor object. The Cursor object allows you to perform SQL commands:
Let’s design a table
We’ll use this one (from https://stackabuse.com/a-sqlite-tutorial-with-python/)
This data contains information about customers, products, and orders.
Let’s break this down into two tables to begin: one containing customer information and one containing product information.
Customer name: we probably want to split this into first name and last name (data type is text)
Date of purchase (data type is text in this case)
Product: the book they bought (data type is text)
Price: the price of the book (data type is real, i.e. a real number, in this case a price)
Let’s check our tables have been created:
Let’s add data:
The workflow for executing INSERT statements is:
Connect to the database
Create a cursor object
Write a parameterized insert SQL statement and store as a variable
Call the execute method on the cursor object passing it the sql variable and the values, as a tuple, to be inserted into the table
Check what’s in our products table after the INSERT:
Let’s pause there – BUT!
If we performed an operation on the database other than sending queries, we need to commit those changes via the .commit() method before we close the connection:
conn.commit()
If we are finished with our operations on the database file, we have to close the connection via the .close() method:
conn.close()
Questions?