CS代考 COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY

RECORD THE LECTURE

DATA TYPES AND DATA SYSTEMS
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY

Copyright By PowCoder代写 加微信 powcoder

WEEK 6 – LECTURE 1 Monday 28 March 2022
of Computing
College of Engineering and Computer Science
Credit: (previous course convenor)

HOUSEKEEPING

Midsemester Exam
• Timetable released
• Thursday 21 April at 1pm (Canberra
• 15 mins reading 7me and 90 minutes wri7ng 7me
• Screen capture set-up required (test it well in advance e.g. on sample exam)
• Check sample exam on course site
• Save , commit, push regularly

Assignment 1
• If you have not forked the repo yet, urgently start now!
• Don’t wait till last minute

Census date
• 31 March is the HECS census date. Last date to drop a course without penalty.

Upcoming public holidays
• Easter Monday 18 April
• ANZAC Day Monday 25 will be no lectures. A make-up session will be arranged on the corresponding Tuesday at 2pm instead to record a lecture, we will run it live and students are welcomed to join in.

Learning Outcomes
Describe various data types and their 01 differences
Describe what a database is and the various existing database models
03 Explain what data abstraction levels are
Recall what a database management 04 system does and the different possible
architectures
Explain what different database languages are and their use

A”ribu’on:Slideshare
The data landscape
A schema defines how data is organised

STRUCTURED

Characteristics (structured data)
• Organized
• Conforms to a format
• Machine readable
• Easy to store/search/query/analyse

Examples (structured data)
• Spreadsheets(debatable,depends) • Databases
• Censusrecords
• Librarycatalogues

SEMI STRUCTURED

Characteristics (semi-structured data)
• No formal data model
• Has some organisa7onal proper7es (uses metadata)
• Self-describing structure
• Easier to catalogue/search and analyze than unstructured

Examples (semi-structured data)
• XML and JSON documents • HTML
• NoSQL databases

UNSTRUCTURED

CharacterisFcs (unstructured data)
• No associated data model
• Usually some minimal structure (mostly free-form)
• Most data in real life is unstructured

Examples (unstructured data)
• Web content
• Social media data
• Satellite images
• Photographs

Quantitative vs Qualitative data
(revision)
• QuanBtaBve – Discrete
-ConBnuous

Quantitative vs Qualitative data
(revision)
• Qualitative – Nominal
-Ordinal (categorical)

A usually large collection of data organized especially for rapid search and retrieval (as by a computer) – Merriam-Webster dictionary

Attribution:Database configuration 23
Database Configuration

Database Models
• Defines the logical structure of a database and determines the manner in which data can be stored, organized and manipulated.
• There are many different types of database models.

Flat Model
Attribution:Flat 25

Hierarchical Model
Attribution:Hierarchical

Network Model
Attribution:Network 27

Relational Model
Attribution:Relational 28

Object-oriented Model
Attribution:Object-oriented 29

Graph Model
Attribution:Graph 30

Data Abstraction Levels
31 A”ribu’on:Data Abstrac’on Levels

Database Management System (DBMS)
• Consistsofinterrelateddataand software for analysing the data.
• Enablesthedefinition,creation, updating, querying and administration of databases
• Allowsforsecuredataaccess
• MySQL,Postgres,EnterpriseDB, MongoDB, Microsoft SQL Server, Oracle, SQLite and IBM DB2

Architecture – 2 tier
A”ribu’on:2 Tier 33

Architecture – 3 tier
34 Attribution:3 Tier

Architecture – N tier
35 Attribution:N Tier

Database Languages
• Data-definition language
• Data-manipulation language

Data-definition language
• Specifyadatabaseschema
• Addi7onalproper7esofdata
• Datastorageandaccessmethods
• Consistencyconstraintsonthedata
–Domain constraints –Referen7al integrity – Asser7ons
– Authoriza7on

Data- manipulation language
• Used to access/manipulate data. Types of access are:
• Retrieve, Insert, Delete and Modify
Two basic types:
• Procedural
Need to specify what data is required and how to get that data
• Declarative
Just specify what data is needed. Not how to get that data

A query is used to retrieve information from a database. It is specified using a query language. The most widely used query language is SQL.
What type of language is SQL – Procedural or Declarative?

• Data Management intro part

Visualisation notes
01 What is visualisation
02 Why does it matter in
presenting data?
Data types and how it influences visualisation types
04 Bad plots

Attribution:Wikipedia
Internet Partial Map 2005

INTRODUCTION

What is visualization?
Technique to create images, diagrams or animations to communicate a message.
Communication with visual imagery has been used from very beginning to communicate both abstract and concrete ideas.
Cave paintings Hieroglyph Maps

Cave painting

Why does visualization matter?
• Large size of data makes it necessary to provide summaries
• People prefer to look at pictures rather than numbers
• Aids model construction, checking plausibility of model assumptions

Attribution:

Need for visualizaFon
Communicate information
Ø Data presentation visualization Ø Convincing other people it is true
Support reasoning about data
Ø Data exploration visualization Ø Exploring what is true

Communicate information
Attribution:Tufte, “Beautiful evidence,” pg. 123 – ’s Napoleon map

Support reasoning about data
On January 28, 1986, the space shuttle Challenger exploded because two rubber O-rings leaked due to the very cold temperatures at launch day.
This potential problem was discussed the day before the launch:
Engineers opposed launching based on data from previous launches, and provided 13 charts to NASA to support their case.

• However, it is difficult to assess the relationship between temperature and O-ring damage based on these charts.
• (One) culprit: what refers to as “chartjunk”
• (Another) culprit: what Tufte refers to as “the cognitive style
of powerpoint”
• A visual display of the data from the investigation after the launch was provided. The poor design and use of chart junk makes it difficult to assess the relationship between temperature and O-ring damage.

A”ribu’on:TuHe, “Visual explana’ons”, pg 46

Support reasoning about about data
Attribution:Tufte, “Visual explanations”, pg 45

DATA TYPES

Data types
Nominal: categorical data, no ordering Example – Fruits{Apple, Oranges, Grapes}
Operations – =, !=
Ordinal: categorical data, ordered Example – Ratings{Poor, Ok, Good} Operations – =, !=, >, <, >=, <= Interval: numerical data, zero has no meaning Example - latitude and longitude Operations - =, !=, >, <, >=, <=, +, - Ratio: numerical data, zero has special meaning Example - weight of person Operations - =, !=, >, <, >=, <=, +, -, /, * Titanic Dataset VISUALIZATION TYPES Basic plot types We’ll now discuss some basic plot types 1D - bar chart, histogram 2D - scatter plot, line plot, box and whisker plot, heatmap 3D+ - scatter matrix, bubble chart Which plot is appropriate depends on: univariate or bivariate data discrete/categorical or continuous Most plots are easy to create (hard to make them aesthetically elegant) ScaGer plot Bubble plot Color scaGer plot 3D scaGer plot 3D scatter Chart junk chartjunk: unnecessary graphics on visualizations. Doesn’t convey additional information but distracts from the point The chart on the left is mostly chartjunk Attribution: To keep in mind Visualiza7ons should enhance understanding, not create confusion S7cking to the basic plot types is beneficial and enhances understanding If a visualiza7on doesn’t give you any new informa7on, ask yourself - Do I need some other visualizaCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com