RECORD THE LECTURE
DATA TYPES AND DATA SYSTEMS
COMP2420/COMP6420 INTRODUCTION TO DATA MANAGEMENT, ANALYSIS AND SECURITY
Copyright By PowCoder代写 加微信 powcoder
WEEK 6 – LECTURE 1 Monday 28 March 2022
of Computing
College of Engineering and Computer Science
Credit: (previous course convenor)
HOUSEKEEPING
Midsemester Exam
• Timetable released
• Thursday 21 April at 1pm (Canberra
• 15 mins reading 7me and 90 minutes wri7ng 7me
• Screen capture set-up required (test it well in advance e.g. on sample exam)
• Check sample exam on course site
• Save , commit, push regularly
Assignment 1
• If you have not forked the repo yet, urgently start now!
• Don’t wait till last minute
Census date
• 31 March is the HECS census date. Last date to drop a course without penalty.
Upcoming public holidays
• Easter Monday 18 April
• ANZAC Day Monday 25 will be no lectures. A make-up session will be arranged on the corresponding Tuesday at 2pm instead to record a lecture, we will run it live and students are welcomed to join in.
Learning Outcomes
Describe various data types and their 01 differences
Describe what a database is and the various existing database models
03 Explain what data abstraction levels are
Recall what a database management 04 system does and the different possible
architectures
Explain what different database languages are and their use
A”ribu’on:Slideshare
The data landscape
A schema defines how data is organised
STRUCTURED
Characteristics (structured data)
• Organized
• Conforms to a format
• Machine readable
• Easy to store/search/query/analyse
Examples (structured data)
• Spreadsheets(debatable,depends) • Databases
• Censusrecords
• Librarycatalogues
SEMI STRUCTURED
Characteristics (semi-structured data)
• No formal data model
• Has some organisa7onal proper7es (uses metadata)
• Self-describing structure
• Easier to catalogue/search and analyze than unstructured
Examples (semi-structured data)
• XML and JSON documents • HTML
• NoSQL databases
UNSTRUCTURED
CharacterisFcs (unstructured data)
• No associated data model
• Usually some minimal structure (mostly free-form)
• Most data in real life is unstructured
Examples (unstructured data)
• Web content
• Social media data
• Satellite images
• Photographs
Quantitative vs Qualitative data
(revision)
• QuanBtaBve – Discrete
-ConBnuous
Quantitative vs Qualitative data
(revision)
• Qualitative – Nominal
-Ordinal (categorical)
A usually large collection of data organized especially for rapid search and retrieval (as by a computer) – Merriam-Webster dictionary
Attribution:Database configuration 23
Database Configuration
Database Models
• Defines the logical structure of a database and determines the manner in which data can be stored, organized and manipulated.
• There are many different types of database models.
Flat Model
Attribution:Flat 25
Hierarchical Model
Attribution:Hierarchical
Network Model
Attribution:Network 27
Relational Model
Attribution:Relational 28
Object-oriented Model
Attribution:Object-oriented 29
Graph Model
Attribution:Graph 30
Data Abstraction Levels
31 A”ribu’on:Data Abstrac’on Levels
Database Management System (DBMS)
• Consistsofinterrelateddataand software for analysing the data.
• Enablesthedefinition,creation, updating, querying and administration of databases
• Allowsforsecuredataaccess
• MySQL,Postgres,EnterpriseDB, MongoDB, Microsoft SQL Server, Oracle, SQLite and IBM DB2
Architecture – 2 tier
A”ribu’on:2 Tier 33
Architecture – 3 tier
34 Attribution:3 Tier
Architecture – N tier
35 Attribution:N Tier
Database Languages
• Data-definition language
• Data-manipulation language
Data-definition language
• Specifyadatabaseschema
• Addi7onalproper7esofdata
• Datastorageandaccessmethods
• Consistencyconstraintsonthedata
–Domain constraints –Referen7al integrity – Asser7ons
– Authoriza7on
Data- manipulation language
• Used to access/manipulate data. Types of access are:
• Retrieve, Insert, Delete and Modify
Two basic types:
• Procedural
Need to specify what data is required and how to get that data
• Declarative
Just specify what data is needed. Not how to get that data
A query is used to retrieve information from a database. It is specified using a query language. The most widely used query language is SQL.
What type of language is SQL – Procedural or Declarative?
• Data Management intro part
Visualisation notes
01 What is visualisation
02 Why does it matter in
presenting data?
Data types and how it influences visualisation types
04 Bad plots
Attribution:Wikipedia
Internet Partial Map 2005
INTRODUCTION
What is visualization?
Technique to create images, diagrams or animations to communicate a message.
Communication with visual imagery has been used from very beginning to communicate both abstract and concrete ideas.
Cave paintings Hieroglyph Maps
Cave painting
Why does visualization matter?
• Large size of data makes it necessary to provide summaries
• People prefer to look at pictures rather than numbers
• Aids model construction, checking plausibility of model assumptions
Attribution:
Need for visualizaFon
Communicate information
Ø Data presentation visualization Ø Convincing other people it is true
Support reasoning about data
Ø Data exploration visualization Ø Exploring what is true
Communicate information
Attribution:Tufte, “Beautiful evidence,” pg. 123 – ’s Napoleon map
Support reasoning about data
On January 28, 1986, the space shuttle Challenger exploded because two rubber O-rings leaked due to the very cold temperatures at launch day.
This potential problem was discussed the day before the launch:
Engineers opposed launching based on data from previous launches, and provided 13 charts to NASA to support their case.
• However, it is difficult to assess the relationship between temperature and O-ring damage based on these charts.
• (One) culprit: what refers to as “chartjunk”
• (Another) culprit: what Tufte refers to as “the cognitive style
of powerpoint”
• A visual display of the data from the investigation after the launch was provided. The poor design and use of chart junk makes it difficult to assess the relationship between temperature and O-ring damage.
A”ribu’on:TuHe, “Visual explana’ons”, pg 46
Support reasoning about about data
Attribution:Tufte, “Visual explanations”, pg 45
DATA TYPES
Data types
Nominal: categorical data, no ordering Example – Fruits{Apple, Oranges, Grapes}
Operations – =, !=
Ordinal: categorical data, ordered Example – Ratings{Poor, Ok, Good} Operations – =, !=, >, <, >=, <=
Interval: numerical data, zero has no meaning
Example - latitude and longitude Operations - =, !=, >, <, >=, <=, +, -
Ratio: numerical data, zero has special meaning
Example - weight of person Operations - =, !=, >, <, >=, <=, +, -, /, *
Titanic Dataset
VISUALIZATION TYPES
Basic plot types
We’ll now discuss some basic plot types
1D - bar chart, histogram
2D - scatter plot, line plot, box and whisker plot, heatmap
3D+ - scatter matrix, bubble chart
Which plot is appropriate depends on:
univariate or bivariate data discrete/categorical or continuous
Most plots are easy to create (hard to make them aesthetically elegant)
ScaGer plot
Bubble plot
Color scaGer plot
3D scaGer plot
3D scatter
Chart junk
chartjunk: unnecessary graphics on visualizations. Doesn’t convey additional information but distracts from the point
The chart on the left is mostly chartjunk
Attribution:
To keep in mind
Visualiza7ons should enhance understanding, not create confusion
S7cking to the basic plot types is beneficial and enhances understanding
If a visualiza7on doesn’t give you any new informa7on, ask yourself - Do I need some other visualiza