Databases
Lecture 1 – Introduction
Bernhard Reus
1
Databases …
…are everywhere!
2
© Bernhard Reus, University of Sussex, 2004-16
3
Headlines all just from September 2013
5
© Bernhard Reus, University of Sussex, 2004-16
Swamped by data
We’re now entering what I call the
“ ,” where the majority of data will be stamped out by machines: software logs, cameras, microphones, RFID readers, wireless sensor networks and so on.
Prof. Joe Hellerstein, Berkley
6
Let’s get an idea of “swamped” What is the record number of tweets per second?
7
© Bernhard Reus, University of Sussex, 2004-16
Let’s get an idea of “swamped” What is the record number of tweets per minute?
8
Let’s get an idea of “swamped” How big is Google’s database (roughly) in byte?
9
© Bernhard Reus, University of Sussex, 2004-16
Measuring this mass of data
• Disk Storage
· 1000 Gigaby Britannica
s
· 1000 Terabytes = 1 Petabyte
· 1000 Petabytes =
1,
e
t
e
=
1
T
e
r
a
b
y
t
1 Exabyte
· 1000 Exabytes = 1 Zettabyte
00
0
cop
ies
of t
he
E
n
cyc
lo
pe
d
ia
20 million 4-door filing cabinets full of text.
• 1000 Zettabytes = 1 Yottabyte
· 1000 Yottabytes = 1 Brontobyte
Brontobytes = 1 Geopbyte
5 Exabytes equal to all of the words ever spoken by mankind
the hard drives to store a Yottabyte cost $100 trillion (as of 2013).
10
· 1000
As of 2013, the World Wide Web is estimated to have reached 4 zettabytes
Module Organisation Autumn 2016
• Lectures
Tue 17:00 in Pev 1-1A7 , Thu 9:00 in Richmond AS03
• Classes & Labs
You have been assigned to a group in various rooms at various times. Check Sussex Direct. Always go to YOUR class.
Classes start next week. Later classes will be in Lab.
• Coursework 2 assignments (see SussexDirect for timing )
• Assessment Coursework (50%) + Exam (50%)
• Web Databases Study Direct Site (contains ALL INFO!)
11
20.10. & 1.12.
© Bernhard Reus, University of Sussex, 2004-16
Classes & Coursework 2016
• Exercise classes with questions available ~1 week in advance.
• Please attend weekly classes regularly!
• Please prepare for the seminars/labs.
• Bring notes and something to write to classes.
• Two assignments to be completed within 9-12 days.
• CWK= 2×50%
• Questions similar to those on exercise sheets.
• Please engage in discussion in lectures and, in particular in seminar/lab classes.
• Also use forum on StudyDirect to discuss and communicate!
12
Literature
13
© Bernhard Reus, University of Sussex, 2004-16
Literature
• Thomas Conolly & Carolyn Begg:
Database Systems, (3rd/4th edition), Addison-Wesley, 2002/4. Library QE 1810 Con
Database Solutions
Addison-Wesley, 2004 does not cover all aspects
• Nahrain Gehani: The Database Book: Principles & Practice Using Mysql, Silicon Press, August 2006. Library QE 1820 GEH.
– Ramez Elmasri & Shamkant B. Navathe: Fundamentals of Database Systems, (4th edition) Addison-Wesley, 2003. Library QE 1810 Elm
– C.J.Date: An Introduction to Database Systems, (8th edition), Addison-Wesley, 2004. Library only old QE 1150 Dat
14
Acknowledgements
The slides will make use of
• quotes and diagrams from Connolly & Begg: Database Systems (3rd ed.)
• several photo images from Philip Greenspun http://philip.greenspun.com
plus some inspiration from his SQL for web Nerds.
15
© Bernhard Reus, University of Sussex, 2004-16
What is a database?
In this lecture
[C&B: Ch.1]
• Why do we need one?
• What can we do with it?
• Why do we have a course about it?
reports
Data base
entry
What is a database?
“a usually large collection of data organized especially for rapid search and retrieval (as by a computer)
[ca. 1962]” Merriam-Webster Dictionary
“a shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization”
Connolly & Begg: “Database Systems”
17
© Bernhard Reus, University of Sussex, 2004-16
What is a database?
“a shared collection
of logically related data,
and a description of this data,
designed to meet the information needs of an organization”
many simultaneous users
data is structured
system catalogue / data- dictionary (self-describing)
design needed
data abstraction
18
What’s a Database Management System?
DBMS
”a software system that enables users to define, create, maintain, and control
access to the database” Connolly & Begg “Database Systems”
19
© Bernhard Reus, University of Sussex, 2004-16
• … consist of: – Data
– Hardware
– Software L – People
• … allow users to: – Store
– Update – Retrieve – Organise – Protect
their data
Database Systems…
20
Why bother ?
• Why do we need to use a database? • Why not simply use the file system?
Collect pros and cons:
21
© Bernhard Reus, University of Sussex, 2004-16
Why not using the file system?
+ simple
+ cheap (already available) + reliable
+ no special training
– data duplication
– incompatible (file) formats
– poor performance of updates in large files – poor performance of search for large data – access rights only per file not per record
– no support for concurrent transactions
-…
22
Capabilities of DBMS
• Providing persistent storage
• Sharing Data
• Representing complex relationships between data
• Reducing & controlling redundancy
• Restricting unauthorised access (enforcing security)
• Avoiding inconsistencies
• Enforcing integrity constraints
• Providing (concurrent) transaction support
• Providing multiple interfaces/views
• Providing backup & recovery
23
© Bernhard Reus, University of Sussex, 2004-16
People involved in DBMS
• DatabaseDeveloperL data planning, standards, design, access
• Analyst & application programmer L provide functionality to end-user
• User
• Database administrator/operator
manages the physical database (system) and its users
• DBMS designer and implementor write database system software
24
• History
• Data Modeling (Entity-
Relationship Diagrams)
• Relational Model (Tables)
• Logical Design
• Physical Design in SQL
• Data Manipulation in SQL
• Queries in SQL
• Complex Queries
• Three-valued Logic
• Programmatic SQL
• Normal Forms
• Transactions
• Security (Permissions)
• Indexing
• NoSQL (not only SQL) Databases, in particular:
• Graph Databases
What we will study
25
© Bernhard Reus, University of Sussex, 2004-16
What we will not study
• Implementation of DBMS software (no look under the bonnet)
• The traded secrets of one particular database system software, eg. mysql (several thousand pages of manual)
• Administration issues
• Tuning
26
Another reason to study Databases
Structured Query Language (the language for relational databases)
Language
Jobs 3rd Quarter 2016
SQL
19,287
JavaScript
15,980
C#
15,043
Java
13,541
Python
6,486
PHP
4,853
C++
4,266
T-SQL
3,472
C
2,888
Ruby
2,884
Source: www.itjobswatch.co.uk
27
© Bernhard Reus, University of Sussex, 2004-16
Another good reason
UK annual average IT Salaries in 2015 (3rd Quarter)
IT Director
Senior Development DBA Management Consultant Systems Architect
Senior Business Analyst Technical Support Manager Development Team Leader Project Manager
Systems Programmer
Senior Software Engineer Services Manager
Software Project Manager Business Analyst
Project Leader/Senior Analyst
£82,500
£75,000
£65,000
£58,750 Systems Administrator £55,000
© Bernhard Reus, University of Sussex, 2004-16
£55,000 £52,500 £52,500 £52,500 £52,500 £50,000 £50,000 £47,500 £46,250
Database Administrator/Analyst Systems Developer
Network Administrator Software Test Engineer
DBMS at the core…
Senior Support Analyst Software Engineer Programmer
£42,500 £42,500 £42,500 £42,425
£40,000 £40,000 £40,000 £37,500 £33,250 £31,000 £26,500 £25,750 £25,000 £22,500
Senior IT Support Web Designer Webmaster
PC Support
Graduate Developer Help Desk Support
www.itjobswatch.co.uk
DBMS
Buzzwords
29
28