CS计算机代考程序代写 database algorithm SQL python chain Excel COMP9321:

COMP9321:
Data services engineering
Term1, 2021
Week 3: Data Visualisation (Principles and Basic Techniques)

Early Babylonian world map (600 BC)
These concentric circles that represent the ocean, named “bitter water” or the “salt sea.”
an early interpretation of the layout of the world

Network Attacks

Visualisation isn’t just about graphics
Highly competent visualisation “tricks” can affect what you see and what you pay attention to:
4
Learning Data Visualization by Bill Shanders, Lynda.com

When Steve Jobs says …
• Again … playing with human visual perception
• 21.2% versus 19.5% slices of the pie
5
https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=00018S
Apple SmartPhone Market Update, Macworld keynote (2008)

6
In this lecture …
Often, visualization is considered highly specialized area and getting it right takes skilled knowledge from multi disciplinary areas (statistics, graphic design, understanding of human perception of visual elements, or sometimes understanding of human psychology)
In most cases, the topic will be a course by itself…
Of course, we cannot get to that level of details here … But if you are doing some data analysis or writing an application that is data-driven, most likely you’d run into some visualization tasks.
What I intend to convey in this lecture is two parts:
First : the basic principles of ‘good and competent’ visualization
Second: Introduction to Matplotlib library, we actually get to see the basic building blocks of visualization techniques which are useful for further exploration of the area if you are interested …

Presentation as a service
Of course, there is another important point to make.
Once you know how to visualise data, what comes naturally after that is to offer that knowledge as a service (API)
It’s known as “Presentation” or “Visualisation” as a service.
“Visualisation on the Cloud”
• e.g., https://www.highcharts.com
• e.g., Google Map (good example of presentation as a service)
• On request, its response can contain either “presentation logic + data”, or already visualised data in graphics/HTML
So for this course context, the concept of “presentation as a service” is more interesting than the visualisation techniques themselves.
7

What is Data Visualization
Data visualization is the process of converting raw data into easily understood pictures of information that enable fast and effective decisions.
The study of visual representations of data to reinforce human cognition.
“Help people understand the, structure, relationships meaning in data.”
Techniques: Charts, Graphs, Maps

What is Visualisation
Visualization transforms data into images that effectively and accurately represent information about the data.
9
https://www1.udel.edu/johnmack/frec682/cholera/ (John Snow Map of Cholera 1854)

What is visualisation
Three types of goals for visualization – … to explore
Nothing is known … Visualisation is used for data exploration/discovery … e.g., A Visual History of Nobel Prizes and Notable Laureates, 1901-2012 https://www.brainpickings.org/2012/11/29/giorgia-lupi-noble-prizes-visualization/
10
https://www.safaribooksonline.com/library/view/designing-data-visualizations/9781449314774/ch01.html

What is visualisation
Three types of goals for visualization – … to analyse
You have some hypotheses. Visualisation is used for Verification or Falsification
You’d normally use what is classified as “data analysis and visualisation” tools (- a range of data source connectivity, built-in quick visualisation graphs, etc.)
11
https://www.safaribooksonline.com/library/view/designing-data-visualizations/9781449314774/ch01.html

What is visualisation
Three types of goals for visualization – … to explain
You do know what the data contains … Visualisation is used for “effective” communication of “results” – Making them clear for the audience
12
https://www.safaribooksonline.com/library/view/designing-data-visualizations/9781449314774/ch01.html

13
https://www.safaribooksonline.com/library/view/designing-data-visualizations/9781449314774/ch01.html
Data Visualisation
Referring to any visual representation of data that is:
• algorithmically drawn (may have custom touches but is largely rendered with the help of computerized methods);
• easy to regenerate with different data (the same form may be repurposed to represent different datasets with similar dimensions or characteristics);
• often aesthetically simple (data is not decorated); and
• relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics).

Anatomy of A Visualization
1. Title
2. X-Axis
3. Y-Axis
4. Series
5. Data Points
Last Period Returns
1
Sales
Returns 4
6
5
5
4.5
4.3
3.5
5
1.8
2.7
2.4
1.5
4
3 2 1 0
3
September
October
2 November
December

So what makes a good visualisation Accuracy, Story, Knowledge: Aim to create a visualisation that are accurate, tell a
good story, and provide real knowledge to the audience.
15
Learning Data Visualization by Bill Shanders, Lynda.com

16
https://en.wikipedia.org/wiki/Charles_Joseph_Minard (Charles Minard Map) Learning Data Visualization by Bill Shanders, Lynda.com
So what makes a good visualisation
Accuracy, Story, Knowledge: Aim to create a visualisation that are accurate, tell a good
story, and provide real knowledge to the audience.
The Minard Map – “The best statistical graphic ever drawn”

So what makes a good visualisation
Accuracy, Story, Knowledge: Aim to create a visualisation that are accurate, tell a good
story, and provide real knowledge to the audience.
Widely attributed to creating the field of Epidemiology
17 Learning Data Visualization by Bill Shanders, Lynda.com

Some of the basics … “Charts vs. Graphs”
Often used interchangeably, but they are different in terms of the “visualisation techniques involved”.
Summary
Summary
30 25 20 15 10
5
0
12345
12345
Sales
Summary
Quarter4 Quarter3 Quarter2 Quarter1
30 25 20 15 10
5
18
Data Visualization Best Practices: Killer Infographics by Amy Balliett,
0 100000 200000 300000 400000
0
12345

Some of the basics … “Charts vs. Graphs”
Sales
Both rely on an established, repeated pattern to show data
e.g., Bar: repeating equal width rectangles along a scale of information
e.g., Comparison Chart: repeating Tick/Cross along a scale of information
Graphs: rely on X or Y or both axes to make sense. At least one of these axes is numeric. Graph draws correlation between these axes by plotting points along the grid
Charts: not restricted by X/Y axes, not necessarily numerical
Which ones are charts? Which ones are graphs?
19
Data Visualization Best Practices: Killer Infographics by Amy Balliett,
Quarter4 Quarter3 Quarter2 Quarter1
0 100000 200000 300000 400000

Some of the basics … Organising data
Imagine a merchandise sales figure by an artist:
T-shirts: 45, CD: 60, Vinyl: 25, Posters: 32, Keychains: 10
Product Sales
Product Sales
Large Keychain Small Keychain Posters Vinyl New CD Old CD Single CD
Small Keychain Keychains
Sleeveless Shirts
Large Keychain Posters
Vinyl CDs Shirts
Product Sales
Old CD Black Shirts
White Shirts
Gray Shirts
New CD
Vinyl
20 30 40
Sleeveless Shirts
Gray Shirts
White Shirts
50 60 70
Black Shirts
20
Data Visualization Best Practices: Killer Infographics by Amy Balliett,
Single CD
0 10
Posters
0
5
10 15
0 20 5 2150
15 30 20
325 30 35

Some of the basics … Organising data
When possible always order the data … but perception could be also tricky …
Product Sales
Product Sales
0 10 20 30 40 50 60 70
CDs
Shirts
Posters
Vinyl
Keychains
Keychains
Vinyl
Posters
Shirts
CDs
21
Data Visualization Best Practices: Killer Infographics by Amy Balliett,
0 10 20 30 40 50 60 70
Subjective interpretation: most people read left to right …

Some of the basics … Organising data
When possible always order the data … but perception could be also tricky …
22
Data Visualization Best Practices: Killer Infographics by Amy Balliett,
When a line follows a ”timeline”, stick to the timeline …

Some of the basics … Colours important
Putting “Form (Prettiness)” before “Function”
25 20 15 10
5 0
Summary
Lots of colours could be confusing, distracting from the information
23
Data Visualization Best Practices: Killer Infographics by Amy Balliett,

24
http://www2.cs.uh.edu/~chengu/T eaching/Fall2017/Visualization_fall2017.html
Some of the basics … Colours important
Not just about limiting colours …
What’s wrong with this colour map?

25
http://www2.cs.uh.edu/~chengu/T eaching/Fall2017/Visualization_fall2017.html
Some of the basics … Colours important

Some of the basics … Colours important
Common advice on using colours:
most people have strong association with pre-established colour meanings. Don’t go against them.
26
http://www2.cs.uh.edu/~chengu/T eaching/Fall2017/Visualization_fall2017.html

Some of the basics … Colours important
Just to make things interesting …. Colour alone is not the whole picture.
27
http://www2.cs.uh.edu/~chengu/T eaching/Fall2017/Visualization_fall2017.html

Some of the basics … Colours important
What about correct contrast choice?
28
http://www2.cs.uh.edu/~chengu/T eaching/Fall2017/Visualization_fall2017.html

Some of the basics … Colours important
Colour blindness is more common than we think … (close to 10% of generic population)
There are resources you could utilise:
e.g., http://colorbrewer2.org/
Tips on increasing accessibility of your visualisation: http://blog.usabilla.com/how-to-design-for-color-blindness/
29
Learning Data Visualization by Bill Shanders, Lynda.com

30
https://www.livestories.com/blog/five-ways-to-fail-data-visualization
Some of the basics … Keep Scales consistent

31
http://callingbullshit.org/tools/tools_misleading_axes.html
Some of the basics … Keep scales consistent

Some of the basics … Legend and Sources
It is absolutely necessary to include the sources of your data and correct legends to interpret your visualisation.
Without the legends, your visualisation is basically a “pretty picture” with no meaning
?????
32

33
Learning Data Visualization by Bill Shanders, Lynda.com
Preparing the data for visualisation
Data almost never comes in the exact form that you need it in
One of the common parts of the data visualisation tasks is to clean and convert the data
Some of the common data adjustments:
• Calculating indexes and ratios
• Calculating percentile
• Aggregating
• Regrouping
• Converting from Excel/CSV to JSON/XML/SQL
You may need simple tools, or database SQL scripting, programming scripting solutions for these.
• Excel (available!)
• Tableau (commercial)
• Direct data manipulation with SQL or programming language

8 6 4 2 0
4.000 3.000 2.000 1.000 0.000
1.500 1.000 0.500 0.000
Preparing the data: Indexes and Ratios
Are we comparing apples with apples (or are they apples and oranges?)
* Indexes and Ratios allow you to convert the data in a way that makes it easy to look at data side-by-side that is not necessarily easy to do in the original form
min_wage
gas
bread
1 4 7 10 13 16 19 22 25 28 31 34
1 4 7 101316192225283134
1 4 7 101316192225283134
6 5 4 3 2 1 0
wages_to_gas
wages_to_gas (ratio)
1 4 7 101316192225283134
8 6 4 2 0
wages_to_bread
wages_to_bread (ratio)
1 4 7 101316192225283134
34
Learning Data Visualization by Bill Shanders, Lynda.com
min_wage/gas min_wage/bread

Preparing the data: Calculating Percentile
Calculating percentile makes it easier to compare numbers to each other as part of a whole — where you stand compared to the rest of the herd, relative standing
1-(ranking/total county)
35
Learning Data Visualization by Bill Shanders, Lynda.com
1-(1/191) 1-(2/191) 1-(3/191) …

Preparing the data: Aggregating, Converting Data
Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis
For example, raw data can be aggregated over a given time period to provide statistics such as average, minimum, maximum, sum, and count.
Quick solution by a tool like Tableau/Excel … Sometimes manual SQL scripts necessary
36
Tableau Online Manual, Aggregating Data

The right paradigm
One of the most difficult things to do is figuring out which charts/graphs to use in which situation. I’d say you need to have the basic competency here – and then be aware of good alternatives.
The good old BAR graphs:
• Highly effective in terms of ‘parsing the information’
• Experts say human brain is wired to differentiate these rectangular shapes
• In fact, you should start by asking “why isn’t a bar graph enough here?”
• But when there are more variables, many “grouped” bars don’t look good
350000 300000 250000 200000 150000 100000
50000 0
Sales
Quarter1
Quarter2
Quarter3
Quarter4
37
Learning Data Visualization by Bill Shanders, Lynda.com

The right paradigm
STACKED Bar graphs:
• Compare data within groups
• Whole bar represents the total value of that group, and each segment represents the value within the group
STACKED Percentage Bar graphs:
• If you want to compare relative contribution of each category to the whole ….
• Whole bar = 100%
• Showing relative strength of each category within the whole
38
Learning Data Visualization by Bill Shanders, Lynda.com

39
http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf
The right paradigm

40
Introduction to Data Visualization, lectures by Peter Aldhous
The right paradigm

The right paradigm
Line graphs:
• To show values “over time” or a continuous interval • Stories over “Timeline”
41
Learning Data Visualization by Bill Shanders, Lynda.com

The right paradigm
Scatter Plot graphs:
• To show two variables and their correlations (i.e., X axis vs. Y axis)
42
https://en.wikipedia.org/wiki/Scatter_plot
Learning Data Visualization by Bill Shanders, Lynda.com

The right paradigm
Scatter Plot graphs:
• To show two variables and their correlations (i.e., X axis vs. Y axis)
No correlations, but discernible patterns
43
Learning Data Visualization by Bill Shanders, Lynda.com
Size of the dots – third variable

Useful Read
• Book: the Functional Art by Alberto Cairo (Chapter 1,2, and 3)
• http://jonathansoma.com/lede/algorithms-2017/classes/fuzziness-matplotlib/how-
pandas-uses-matplotlib-plus-figures-axes-and-subplots/ • https://pythonspot.com/visualize-data-with-pandas/
44