CRICOS code 00025BCRICOS code 00025B
This course includes 13 lectures and 10 tutorial/practical sessions
Copyright By PowCoder代写 加微信 powcoder
Programming and Linux experiences required!
Recap – Course Structure
Introduction
topics&appl
VT: Docker I
VT: Docker II Spark I
DBs in Cloud
Networks &
Load Balancing
Security &
Lecture 11
Lecture 12
Lecture 10
VT: Docker III
Lecture 13
Concepts Orchestration Storage Computation Others
CRICOS code 00025BCRICOS code 00025B
• Database Background
• Relational Data Bases
– Revisit Relational DBs
– ACID Properties
– Clustered RDBMs
• Non-relational Data Bases
– NoSQL concepts & CAP Theorem
– Cassandra
CRICOS code 00025BCRICOS code 00025B
“A set of information held in a computer”
Oxford English Dictionary
“One or more large structured sets of persistent data, usually associated with software to
update and query the data”
Free On-Line Dictionary of Computing
“A collection of data arranged for ease and speed of search and retrieval”
Dictionary.com
A database is a shared, integrated computer structure that stores a collection of the following:
• End-user data—that is, raw facts of interest to the end user
• Metadata, or data about data, through which the end-user data is integrated and managed
What is a database?
1.Coronel, Carlos, and . Database systems: design, implementation, & management. Cengage Learning, 2016.
CRICOS code 00025BCRICOS code 00025B
Program data independence
• easily change the structure of database without modifying the application program.
Sharing of data, controlled redundancy
• data can be shared by authorized users.
• data is recorded in only one place in the database and it is not duplicated.
Concurrent, multi-user access to large volumes of data
• Many authorized users can simultaneously access the same piece of information.
• The remote users can also access the same data.
• Concurrent assess to data in file system leads to incorrect data.
Back-up and recovery for reliability
• Most of the DBMSs provide the ‘backup and recovery’ sub-systems that automatically create the
backup of data and restore data if required.
• In a computer file-based system, the back-up and recovery are often inefficient.
Why Database systems?
CRICOS code 00025BCRICOS code 00025B
Uniform, logical data model for representing data
• The logical data model represents the conceptual data model (including attributes, names,
relationships, and other metadata). Logical data model is often developed by UML.
Rich, standard language for querying data
• A number of extended SQL language, such as PLSQL (Oracle), PL/pgSQL (PostgreSQL),
Transact-SQL or T-SQL (Microsoft SQL Server)
Effective optimizations for efficient querying
• DBMS queries can be optimized to achieve efficiency
Ensure data integrity within single applications
• Integrity constraints or consistency rules can be applied to database so that the correct data
can be entered into database.
Why Database systems?
CRICOS code 00025BCRICOS code 00025B
A relational database is a collection of data items organized as a set of formally-described tables from
which data can be accessed or reassembled in many different ways without having to reorganize the
database tables.
In [1], relation algebra for databases was proposed and formed the foundation of modern relational
databases.
Five primitive operators in [1] are selection, projection, Cartesian product, set union, and set difference.
Relational database
The relational database was invented by
E. F. Codd at IBM in 1970.
[1] Codd, E. F. (1970). “Relational Completeness of Data Base Sublanguages”
Turing Award in 1981
CRICOS code 00025BCRICOS code 00025B
• The database management system (DBMS) is
the software that interacts with end users,
applications, and the database itself to
capture and analyze the data.
• The DBMS software additionally includes the
core facilities provided to administer the
• Often the term “database” is also used to
loosely refer to any of the DBMS, the
database system or an application associated
with the database.
Key concepts in Relational database
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
Some important database terms:
School of Information and Communication Technology
Entities / Tables – Entities represent items that we want
to store data about (e.g., a student)
Attributes / Fields / Columns headings – attributes
are the pieces of data that we want to store (e.g., the
students name)
Relationships – relationships are used to show how
entities within the database are related (e.g., a student
may be enrolled in a course, so keep student ID in
Enrolment table).
Records / Rows – a logically connected one or more
fields, e.g., a student record consisting of name, student
number & phone
Data / Data item – raw fact, e.g., student grade or
phone number.
Metadata – data about data that provides description of
data to enable program–data independence, e.g., type
of data (number or text)
Attributes
Relationship
CRICOS code 00025BCRICOS code 00025B
• The database schema of a database
system is its structure described in
a formal language supported by
the database management
system (DBMS).
• The term “schema” refers to the
organization of data as a blueprint of
how the database is constructed
(divided into database tables in the
case of relational databases).
What is a database schema?
https://stackoverflow.com/questions/45135485/creating-a-database-schema-
from-the-movie-database
https://en.wikipedia.org/wiki/Database_schema
CRICOS code 00025BCRICOS code 00025B
Data Definition Language (DDL) – Defines and modifies a schema e.g. CREATE / DROP / ALTER table;
does not manipulate data
Data Manipulation Language (DML) – Language used to retrieve (SELECT), add (INSERT),
modify (UPDATE) and DELETE data
Data Control Language (DCL) statements. Used for providing (GRANT) / withdrawing
(REVOKE) access privileges
Transaction Control Language (TCL) statements are used to manage the changes made by
DML statements. It allows statements to be grouped together into logical transactions.
Example: COMMIT, ROLLBACK, etc.
Types of SQL statements:
School of Information and Communication Technology
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
No duplicate tuples (rows)
• by definition sets do not contain duplicate elements
• hence tuples (rows) are unique
• Ensured by entity integrity (primary key)
Tuples are unordered within a relation (table)
• by definition sets are not ordered
• hence tuples can only be accessed by content
No ordering of attributes within a tuple
• by definition sets are not ordered
School of Information and Communication Technology
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
Entity-Relation Design (ERD):
• Entity: corresponds to a table in which we store data about a particular thing, e.g., Student
• Attribute: describes characteristics of an entity, e.g. attributes for the Employee entity are
employee number, first name, last name, job title etc.
• Relationship: illustrates an association (business rule) between two entities, e.g. a verb
School of Information and Communication Technology
Employee number
First name
Dept number
Department
Dept number
Dept location
Relationship
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
What is an actual business rule between Employee and Department entities?
• An employee works in a department and
• A department can employ many employees.
Cardinality/connectivity specifies maximum number of times an instance of an entity can be
related to instances of a related entity (1:1, 1:M, and M:N).
Modality/participation specifies minimum number of times an instance of an entity can be related
to instances of a related entity (0 or 1).
School of Information and Communication Technology
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
NORMALISATION – SIMPLY ‘COMMON SENSE’
• Converts a relation into relations of progressively
smaller number of attributes and tuples until an
optimum level of decomposition is reached – little or no
data redundancy exists
• Normalisation is a Relational Database Implementation
Model focused approach (it makes extensive use of
FK’s to connect relations)
• Each table represents a single subject
• No data item will be unnecessarily stored in more than
one table, i.e., No data redundancy
• All non-key attributes in a table are dependent on the
primary key
• Each table is void of insertion, update, deletion
• Objective of normalisation is to ensure that all tables
are in at least 3NF
School of Information and Communication Technology
Redundancy!
CRICOS code 00025BCRICOS code 00025B
Key concepts in Relational database
Transaction (ACID) Properties
• Atomicity – all parts of a transaction be completed successfully otherwise, the transaction is
aborted (never partially executed, Done or not done)
• Consistency – concurrent execution of transactions yields consistent results (consistent ->
consistent, inconsistent -> roll back -> consistent)
• Isolation – data used during one transaction cannot be used by a second until the first is
completed- Multi-user
• Durability – ensures that the result or effect of a committed transaction persists in case of a
system failure.
All transaction properties work together to make sure that a database maintains data integrity and
consistency for a single-user or a multi-user DBMS.
School of Information and Communication Technology
CRICOS code 00025BCRICOS code 00025B
Clustered RDBMs
• RDBMS is famous for ACID and suitable for transaction workloads, but
scalability of RDBMs is an issue.
• To horizontally scale up and support ACID transactions, traditional RDBMs
have been extended to maintain high-availability, high-redundancy in a
distributed computing environment.
• Products:
– MySQL NDB Cluster 8.0
– CockroachDB
– Amazon Aurora
• Pros: high availability (99.99%+), high throughput and low latency, etc.
• Cons: complex management/deployment/configuration, no foreign key, need
huge memory and disk storage.
School of Information and Communication Technology
CRICOS code 00025BCRICOS code 00025B
• Database Background
• Relational Data Bases
– Revisit Relational DBs
– ACID Properties
– Clustered RDBMs
• Non-relational Data Bases
– NoSQL concepts & CAP Theorem
– Cassandra
CRICOS code 00025BCRICOS code 00025B
What is NoSQL?
http://www.gradpost.ucsb.edu/events/events-article/2019/07/09/default-calendar/academic-job-market-writing-success-series-c.v.s
No Relational Query
A class of database management systems (DBMS)
NoSQL – “Not only SQL”
Does not use SQL as querying language
Distributed, fault-tolerant architecture
No fixed schema (formally described structure)
No joins (typical in databases operated with SQL)
It’s not a replacement for a RDBMS but compliments it
CRICOS code 00025BCRICOS code 00025B
Modern Web applications:
• Support large numbers of concurrent users (tens of thousands, perhaps millions)
– / Singles’ Day (11.11) / Boxing Day
• Deliver highly responsive experiences to a globally distributed base of users
– Need scalability (horizontally scalable)
• Be always available – no downtime
– Rare to see downtime/maintenance for YouTube or Amazon
• Handle semi- and unstructured data
– Twitter tweets / Youtube videos / Instagram photos
• Rapidly adapt to changing requirements with frequent updates and new features
– Cloud environment & cluster deployment
• The world has gone mobile, but still need data storage.
Why NoSQL?
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Key-value stores
• Key is a string, e.g. “s123456”, “user109029”, or
“19228872987494923”
• Value can be in different types: Double, Int, Array, List, etc.
• Usage cases: Frequent I/O operations in simple data model
– mobile apps, shopping carts, etc.
• Pros and Cons:
– Scalable, flexible, high performance for writing
– Not suitable for structured data and low query performance for
conditions
• Representative Products:
– Redis, Riak, Memcached, etc.
https://www.kdnuggets.com/2016/06/top-nosql-database-engines.html
Key: sn1 Value: “Mike, 24, Brisbane, …”
Key: sn2 Value: “Mary, 22, Sydney, …”
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Document stores
• Similar to key-value stores, but allows nested values associated with each
• Value is a document (nested values)
• Usage cases:
– Document-oriented data, e.g. Tweets, Facebook posts, Amazon
comments/feedback
– Semi-structured data, e.g. JSON files and XML files.
• Pros and Cons:
– Flexible data structure, simple, and multi-indexed capability.
– No unique query language (like calling a function in MongoDb)
• Representative products:
– MongoDB, CouchDB, etc.
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Column stores
• Data are stored by columns rather than rows.
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Column stores (e.g., HBase, Cassandra)
• Practical use of a column store versus a row store differs little
in the relational DBMS world.
• Row-based systems are designed to efficiently return data for
an entire row (a record).
– Example: return a record of user, including name, age,
gender, address, etc.
• Row-based systems are not efficient for column-wide
operations on the whole table
– Find all records of customer’s age between 20 and 30
across a 10M table.
– Scan the column of age row by row without skipping other
fields (name, gender, etc.)
– Big table will need multiple data block (machine) to fit in all
data (more disk operations).
• Columnar store can be more efficiently accessed for some
particular operations (e.g. data aggregation)
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Column stores
• With column storage, different columns or column families can be distributed
over multiple data nodes.
• offers very high performance and a highly scalable architecture.
• Usage cases:
– Big Data process, e.g. Google search (BigTable), Spotify (user
recommendation), Facebook (messages)
• Pros and Cons:
– High performance in terms of query, scalability, distribution.
– Incremental data loading and Queries against only a few rows
• Representative products:
– BigTable, Hbase, Cassandra, etc.
CRICOS code 00025BCRICOS code 00025B
Types of NoSQL Databases
Graph databases
• Nodes and relationships are the bases of graph databases.
– A node represents an entity, like a user, category, or a piece of data.
– A relationship represents how two nodes are associated, like friendship, works for, etc.
• The nature of connections in Graph eliminates the (time-consuming) search-match operation
found in relational databases.
• Usage cases:
– Network data, e.g. friendships in Tweets, Facebook
• Pros and Cons:
– Model complex relationship and support Graph-based algorithm.
– Complexity (smaller data scale)
• Representative products:
– Neo4j, OrientDB, InfoGrid, etc.
CRICOS code 00025BCRICOS code 00025B
• A database shard is a horizontal partition of data in a database.
• Each individual partition is referred to as a shard or database shard.
• Each shard is held on a separate database server instance, to spread load of a large volume
• Some data within a database remains present in all shards, but some appears only in a
single shard.
• Each shard (or server) acts as the single source for this subset of data.
Database Shards
P. 27https://en.wikipedia.org/wiki/Shard_(database_architecture)
R0: {id: 95, name: ‘aa’, tag:’older’}
R1: {id: 302, name: ‘bb’,}
R2: {id: 759, name: ‘aa’,}
R3: {id: 607, name: ‘dd’, age: 18}
R4: {id: 904, name: ‘ff’,}
R5: {id: 246, name: ‘gg’,}
R6: {id: 148, name: ‘ff’,}
R7: {id: 533, name: ‘kk’,}
CRICOS code 00025BCRICOS code 00025B
CAP Theorem
• The CAP theorem states that it is impossible for
a distributed data store to simultaneously provide more
than two out of the following three guarantees:
– Consistency: Every read receives the most recent
write or an error
– Availability: Every request receives a (non-error)
response – without the guarantee that it contains the
most recent write
– Partition tolerance: The system continues to operate
despite an arbitrary number of messages being
dropped (or delayed) by the network between nodes
https://en.wikipedia.org/wiki/Eric_Brewer_(scientist)
CRICOS code 00025BCRICOS code 00025B
CA (Consistency & Availability)
➢All clients always have the same view on the data
➢Each Client can always read and write
➢The system may not tolerance to failure and reconfiguration
AP (Availability & Partition Tolerance)
➢Each Client can always read and write
➢The system works well despite the physical partitions
➢Clients may have inconsistent views on the data
CP (Consistency & Partition Tolerance)
➢All clients always have the same view on the data
➢The system works well despite the physical partitions
➢Clients sometimes may not be able to access data
NoSQL Implementation Options
CRICOS code 00025BCRICOS code 00025B
Example of CAP
Proof in diagrams
http://www.julianbrowne.com/article/brewers-cap-theorem
CRICOS code 00025BCRICOS code 00025B
CAP for DBs
You must always give
something up: consistency,
availability or tolerance to
failure and reconfiguration
CRICOS code 00025BCRICOS code 00025B
BASE: ACID Alternative
• The opposite of ACID for transactions in relational databases is BASE:
– Basically Available, Soft-state, Eventually consistent
• Basically Available:
– one distributed system has failure parts but the total system is still working properly.
– E.g. online shopping on
• Soft-State: corresponds to hard-state, which guarantees consistency and durability in
RDBMS, and allows delays or outages (short period)
• Eventually consistent:
– Strong consistency: the data should be consistent after every transaction.
– Rather than requiring consistency after every transaction, it is enough for the distributed database
to eventually be in a consistent state.
CRICOS code 00025BCRICOS code 00025B
• Database Background
• Relational Data Bases
– Revisit Relational DBs
– ACID Properties
– Clustered RDBMs
• Non-relational Data Bases
– NoSQL concepts & CAP Theorem
– Cassandra
CRICOS code 00025BCRICOS code 00025B
Examples of NoSQL databases
CRICOS code 00025BCRICOS code 00025B
• A document-oriented NoSQL database:
– developed by C++ and cross-platform
– open-source, dfs-based, and horizontally scalable
– Schema-less: No Data Definition Language (DDL)
– So you can store hashes with any keys and values that you choose
Keys are a basic data type but in reality stored as strings
Document Identifiers (_id) will be created for each document, field name reserved by system
– Uses BSON format: based on JSON (JSON like, B stands for Binary)
• Supports APIs (drivers) in many computer languages (10+)
– JavaScript, Python, Ruby, Perl, Java, Scala, C#, C++, Haskell, Erlang, etc.
What is Mong
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com