CS代写 CRICOS code 00025BCRICOS code 00025B

CRICOS code 00025BCRICOS code 00025B

This course includes 13 lectures and 10 tutorial/practical sessions

Copyright By PowCoder代写 加微信 powcoder

Programming and Linux experiences required!

Recap – Course Structure

Introduction

topics&appl

VT: Docker I

VT: Docker II Spark I

DBs in Cloud

Networks &

Load Balancing

Security &

Lecture 11

Lecture 12

Lecture 10

VT: Docker III

Lecture 13

Concepts Orchestration Storage Computation Others

CRICOS code 00025BCRICOS code 00025B

• Database Background

• Relational Data Bases

– Revisit Relational DBs

– ACID Properties

– Clustered RDBMs

• Non-relational Data Bases

– NoSQL concepts & CAP Theorem

– Cassandra

CRICOS code 00025BCRICOS code 00025B

“A set of information held in a computer”

Oxford English Dictionary

“One or more large structured sets of persistent data, usually associated with software to
update and query the data”

Free On-Line Dictionary of Computing

“A collection of data arranged for ease and speed of search and retrieval”

Dictionary.com

A database is a shared, integrated computer structure that stores a collection of the following:

• End-user data—that is, raw facts of interest to the end user

• Metadata, or data about data, through which the end-user data is integrated and managed

What is a database?

1.Coronel, Carlos, and . Database systems: design, implementation, & management. Cengage Learning, 2016.

CRICOS code 00025BCRICOS code 00025B

Program data independence

• easily change the structure of database without modifying the application program.

Sharing of data, controlled redundancy

• data can be shared by authorized users.

• data is recorded in only one place in the database and it is not duplicated.

Concurrent, multi-user access to large volumes of data

• Many authorized users can simultaneously access the same piece of information.

• The remote users can also access the same data.

• Concurrent assess to data in file system leads to incorrect data.

Back-up and recovery for reliability

• Most of the DBMSs provide the ‘backup and recovery’ sub-systems that automatically create the
backup of data and restore data if required.

• In a computer file-based system, the back-up and recovery are often inefficient.

Why Database systems?

CRICOS code 00025BCRICOS code 00025B

Uniform, logical data model for representing data

• The logical data model represents the conceptual data model (including attributes, names,

relationships, and other metadata). Logical data model is often developed by UML.

Rich, standard language for querying data

• A number of extended SQL language, such as PLSQL (Oracle), PL/pgSQL (PostgreSQL),

Transact-SQL or T-SQL (Microsoft SQL Server)

Effective optimizations for efficient querying

• DBMS queries can be optimized to achieve efficiency

Ensure data integrity within single applications

• Integrity constraints or consistency rules can be applied to database so that the correct data

can be entered into database.

Why Database systems?

CRICOS code 00025BCRICOS code 00025B

A relational database is a collection of data items organized as a set of formally-described tables from

which data can be accessed or reassembled in many different ways without having to reorganize the

database tables.

In [1], relation algebra for databases was proposed and formed the foundation of modern relational

databases.

Five primitive operators in [1] are selection, projection, Cartesian product, set union, and set difference.

Relational database

The relational database was invented by

E. F. Codd at IBM in 1970.

[1] Codd, E. F. (1970). “Relational Completeness of Data Base Sublanguages”

Turing Award in 1981

CRICOS code 00025BCRICOS code 00025B

• The database management system (DBMS) is

the software that interacts with end users,

applications, and the database itself to

capture and analyze the data.

• The DBMS software additionally includes the

core facilities provided to administer the

• Often the term “database” is also used to

loosely refer to any of the DBMS, the

database system or an application associated

with the database.

Key concepts in Relational database

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

Some important database terms:

School of Information and Communication Technology

Entities / Tables – Entities represent items that we want

to store data about (e.g., a student)

Attributes / Fields / Columns headings – attributes

are the pieces of data that we want to store (e.g., the

students name)

Relationships – relationships are used to show how

entities within the database are related (e.g., a student

may be enrolled in a course, so keep student ID in

Enrolment table).

Records / Rows – a logically connected one or more

fields, e.g., a student record consisting of name, student
number & phone

Data / Data item – raw fact, e.g., student grade or

phone number.

Metadata – data about data that provides description of

data to enable program–data independence, e.g., type

of data (number or text)

Attributes

Relationship

CRICOS code 00025BCRICOS code 00025B

• The database schema of a database

system is its structure described in

a formal language supported by

the database management

system (DBMS).

• The term “schema” refers to the

organization of data as a blueprint of

how the database is constructed

(divided into database tables in the

case of relational databases).

What is a database schema?

https://stackoverflow.com/questions/45135485/creating-a-database-schema-

from-the-movie-database

https://en.wikipedia.org/wiki/Database_schema

CRICOS code 00025BCRICOS code 00025B

Data Definition Language (DDL) – Defines and modifies a schema e.g. CREATE / DROP / ALTER table;

does not manipulate data

Data Manipulation Language (DML) – Language used to retrieve (SELECT), add (INSERT),

modify (UPDATE) and DELETE data

Data Control Language (DCL) statements. Used for providing (GRANT) / withdrawing

(REVOKE) access privileges

Transaction Control Language (TCL) statements are used to manage the changes made by

DML statements. It allows statements to be grouped together into logical transactions.

Example: COMMIT, ROLLBACK, etc.

Types of SQL statements:

School of Information and Communication Technology

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

No duplicate tuples (rows)

• by definition sets do not contain duplicate elements

• hence tuples (rows) are unique

• Ensured by entity integrity (primary key)

Tuples are unordered within a relation (table)

• by definition sets are not ordered

• hence tuples can only be accessed by content

No ordering of attributes within a tuple

• by definition sets are not ordered

School of Information and Communication Technology

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

Entity-Relation Design (ERD):

• Entity: corresponds to a table in which we store data about a particular thing, e.g., Student

• Attribute: describes characteristics of an entity, e.g. attributes for the Employee entity are

employee number, first name, last name, job title etc.

• Relationship: illustrates an association (business rule) between two entities, e.g. a verb

School of Information and Communication Technology

Employee number

First name

Dept number

Department
Dept number

Dept location

Relationship

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

What is an actual business rule between Employee and Department entities?

• An employee works in a department and

• A department can employ many employees.

Cardinality/connectivity specifies maximum number of times an instance of an entity can be

related to instances of a related entity (1:1, 1:M, and M:N).

Modality/participation specifies minimum number of times an instance of an entity can be related

to instances of a related entity (0 or 1).

School of Information and Communication Technology

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

NORMALISATION – SIMPLY ‘COMMON SENSE’

• Converts a relation into relations of progressively
smaller number of attributes and tuples until an
optimum level of decomposition is reached – little or no
data redundancy exists

• Normalisation is a Relational Database Implementation
Model focused approach (it makes extensive use of
FK’s to connect relations)

• Each table represents a single subject

• No data item will be unnecessarily stored in more than
one table, i.e., No data redundancy

• All non-key attributes in a table are dependent on the
primary key

• Each table is void of insertion, update, deletion

• Objective of normalisation is to ensure that all tables
are in at least 3NF

School of Information and Communication Technology

Redundancy!

CRICOS code 00025BCRICOS code 00025B

Key concepts in Relational database

Transaction (ACID) Properties

• Atomicity – all parts of a transaction be completed successfully otherwise, the transaction is

aborted (never partially executed, Done or not done)

• Consistency – concurrent execution of transactions yields consistent results (consistent ->

consistent, inconsistent -> roll back -> consistent)

• Isolation – data used during one transaction cannot be used by a second until the first is

completed- Multi-user

• Durability – ensures that the result or effect of a committed transaction persists in case of a

system failure.

All transaction properties work together to make sure that a database maintains data integrity and

consistency for a single-user or a multi-user DBMS.

School of Information and Communication Technology

CRICOS code 00025BCRICOS code 00025B

Clustered RDBMs

• RDBMS is famous for ACID and suitable for transaction workloads, but

scalability of RDBMs is an issue.

• To horizontally scale up and support ACID transactions, traditional RDBMs

have been extended to maintain high-availability, high-redundancy in a

distributed computing environment.

• Products:

– MySQL NDB Cluster 8.0

– CockroachDB

– Amazon Aurora

• Pros: high availability (99.99%+), high throughput and low latency, etc.

• Cons: complex management/deployment/configuration, no foreign key, need

huge memory and disk storage.

School of Information and Communication Technology

CRICOS code 00025BCRICOS code 00025B

• Database Background

• Relational Data Bases

– Revisit Relational DBs

– ACID Properties

– Clustered RDBMs

• Non-relational Data Bases

– NoSQL concepts & CAP Theorem

– Cassandra

CRICOS code 00025BCRICOS code 00025B

What is NoSQL?

http://www.gradpost.ucsb.edu/events/events-article/2019/07/09/default-calendar/academic-job-market-writing-success-series-c.v.s

No Relational Query

A class of database management systems (DBMS)

NoSQL – “Not only SQL”

Does not use SQL as querying language

Distributed, fault-tolerant architecture

No fixed schema (formally described structure)

No joins (typical in databases operated with SQL)

It’s not a replacement for a RDBMS but compliments it

CRICOS code 00025BCRICOS code 00025B

Modern Web applications:

• Support large numbers of concurrent users (tens of thousands, perhaps millions)

– / Singles’ Day (11.11) / Boxing Day

• Deliver highly responsive experiences to a globally distributed base of users

– Need scalability (horizontally scalable)

• Be always available – no downtime

– Rare to see downtime/maintenance for YouTube or Amazon

• Handle semi- and unstructured data

– Twitter tweets / Youtube videos / Instagram photos

• Rapidly adapt to changing requirements with frequent updates and new features

– Cloud environment & cluster deployment

• The world has gone mobile, but still need data storage.

Why NoSQL?

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases

Key-value stores

• Key is a string, e.g. “s123456”, “user109029”, or

“19228872987494923”

• Value can be in different types: Double, Int, Array, List, etc.

• Usage cases: Frequent I/O operations in simple data model

– mobile apps, shopping carts, etc.

• Pros and Cons:

– Scalable, flexible, high performance for writing

– Not suitable for structured data and low query performance for

conditions

• Representative Products:

– Redis, Riak, Memcached, etc.

https://www.kdnuggets.com/2016/06/top-nosql-database-engines.html

Key: sn1 Value: “Mike, 24, Brisbane, …”

Key: sn2 Value: “Mary, 22, Sydney, …”

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases

Document stores

• Similar to key-value stores, but allows nested values associated with each

• Value is a document (nested values)

• Usage cases:

– Document-oriented data, e.g. Tweets, Facebook posts, Amazon

comments/feedback

– Semi-structured data, e.g. JSON files and XML files.

• Pros and Cons:

– Flexible data structure, simple, and multi-indexed capability.

– No unique query language (like calling a function in MongoDb)

• Representative products:

– MongoDB, CouchDB, etc.

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases

Column stores

• Data are stored by columns rather than rows.

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases
Column stores (e.g., HBase, Cassandra)

• Practical use of a column store versus a row store differs little

in the relational DBMS world.

• Row-based systems are designed to efficiently return data for

an entire row (a record).

– Example: return a record of user, including name, age,

gender, address, etc.

• Row-based systems are not efficient for column-wide

operations on the whole table

– Find all records of customer’s age between 20 and 30

across a 10M table.

– Scan the column of age row by row without skipping other

fields (name, gender, etc.)

– Big table will need multiple data block (machine) to fit in all

data (more disk operations).

• Columnar store can be more efficiently accessed for some

particular operations (e.g. data aggregation)

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases

Column stores

• With column storage, different columns or column families can be distributed

over multiple data nodes.

• offers very high performance and a highly scalable architecture.

• Usage cases:

– Big Data process, e.g. Google search (BigTable), Spotify (user

recommendation), Facebook (messages)

• Pros and Cons:

– High performance in terms of query, scalability, distribution.

– Incremental data loading and Queries against only a few rows

• Representative products:

– BigTable, Hbase, Cassandra, etc.

CRICOS code 00025BCRICOS code 00025B

Types of NoSQL Databases
Graph databases

• Nodes and relationships are the bases of graph databases.

– A node represents an entity, like a user, category, or a piece of data.

– A relationship represents how two nodes are associated, like friendship, works for, etc.

• The nature of connections in Graph eliminates the (time-consuming) search-match operation

found in relational databases.

• Usage cases:

– Network data, e.g. friendships in Tweets, Facebook

• Pros and Cons:

– Model complex relationship and support Graph-based algorithm.

– Complexity (smaller data scale)

• Representative products:

– Neo4j, OrientDB, InfoGrid, etc.

CRICOS code 00025BCRICOS code 00025B

• A database shard is a horizontal partition of data in a database.

• Each individual partition is referred to as a shard or database shard.

• Each shard is held on a separate database server instance, to spread load of a large volume

• Some data within a database remains present in all shards, but some appears only in a

single shard.

• Each shard (or server) acts as the single source for this subset of data.

Database Shards

P. 27https://en.wikipedia.org/wiki/Shard_(database_architecture)

R0: {id: 95, name: ‘aa’, tag:’older’}
R1: {id: 302, name: ‘bb’,}
R2: {id: 759, name: ‘aa’,}
R3: {id: 607, name: ‘dd’, age: 18}
R4: {id: 904, name: ‘ff’,}
R5: {id: 246, name: ‘gg’,}
R6: {id: 148, name: ‘ff’,}
R7: {id: 533, name: ‘kk’,}

CRICOS code 00025BCRICOS code 00025B

CAP Theorem

• The CAP theorem states that it is impossible for

a distributed data store to simultaneously provide more

than two out of the following three guarantees:

– Consistency: Every read receives the most recent

write or an error

– Availability: Every request receives a (non-error)

response – without the guarantee that it contains the

most recent write

– Partition tolerance: The system continues to operate

despite an arbitrary number of messages being

dropped (or delayed) by the network between nodes

https://en.wikipedia.org/wiki/Eric_Brewer_(scientist)

CRICOS code 00025BCRICOS code 00025B

CA (Consistency & Availability)
➢All clients always have the same view on the data

➢Each Client can always read and write

➢The system may not tolerance to failure and reconfiguration

AP (Availability & Partition Tolerance)
➢Each Client can always read and write

➢The system works well despite the physical partitions

➢Clients may have inconsistent views on the data

CP (Consistency & Partition Tolerance)
➢All clients always have the same view on the data

➢The system works well despite the physical partitions

➢Clients sometimes may not be able to access data

NoSQL Implementation Options

CRICOS code 00025BCRICOS code 00025B

Example of CAP

Proof in diagrams

http://www.julianbrowne.com/article/brewers-cap-theorem

CRICOS code 00025BCRICOS code 00025B

CAP for DBs

You must always give

something up: consistency,

availability or tolerance to

failure and reconfiguration

CRICOS code 00025BCRICOS code 00025B

BASE: ACID Alternative

• The opposite of ACID for transactions in relational databases is BASE:

– Basically Available, Soft-state, Eventually consistent

• Basically Available:

– one distributed system has failure parts but the total system is still working properly.

– E.g. online shopping on

• Soft-State: corresponds to hard-state, which guarantees consistency and durability in

RDBMS, and allows delays or outages (short period)

• Eventually consistent:

– Strong consistency: the data should be consistent after every transaction.

– Rather than requiring consistency after every transaction, it is enough for the distributed database

to eventually be in a consistent state.

CRICOS code 00025BCRICOS code 00025B

• Database Background

• Relational Data Bases

– Revisit Relational DBs

– ACID Properties

– Clustered RDBMs

• Non-relational Data Bases

– NoSQL concepts & CAP Theorem

– Cassandra

CRICOS code 00025BCRICOS code 00025B

Examples of NoSQL databases

CRICOS code 00025BCRICOS code 00025B

• A document-oriented NoSQL database:

– developed by C++ and cross-platform

– open-source, dfs-based, and horizontally scalable

– Schema-less: No Data Definition Language (DDL)

– So you can store hashes with any keys and values that you choose

 Keys are a basic data type but in reality stored as strings

 Document Identifiers (_id) will be created for each document, field name reserved by system

– Uses BSON format: based on JSON (JSON like, B stands for Binary)

• Supports APIs (drivers) in many computer languages (10+)

– JavaScript, Python, Ruby, Perl, Java, Scala, C#, C++, Haskell, Erlang, etc.

What is Mong

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com