extra-visualisation
Introduction¶
These exercises are intended to demonstrate some of the visualisation options matplotlib is capable of, and to give you an opportunity to practice visualisations outside of the lab.
In these exercises you will:
learn how to visualize a set of data using a Python library called matplotlib.
find out different forms of visualization, such as bar charts, histograms, scatter plot, and line plot.
customize the visualization output; for example, by modifying axis properties or adding labels
By the end of the worksheet you will be able to transform a set of data into an appropriate visualization form.
Why Visualization?¶
The power of ‘preattentive perception’ is the foundation of visualization. People see some things preattentively, without the need of focused attention. These visual properties can be distinguised in less than 200 millisecconds (eye movements take 200 msecs) [Healey, 2005]. What ‘preattentive perception’ is shall be clarified in the next example.
The following example uses the maximum temperature data. The CSV-formatted data contains the average maximum temperature recorded for all major Australian cities during the period March 2007 to February 2008 (obtained from the Australian Government’s Bureau of Meteorology). The data is presented below in two forms: text in a table and a multi-lines plot.
city/month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Melbourne 41.2 35.5 37.4 29.3 23.9 16.8 18.2 25.7 22.3 33.5 36.9 41.1
Brisbane 31.3 40.2 37.9 29 30 26.7 26.7 28.8 31.2 34.1 31.1 31.2
Darwin 34 34 33.2 34.5 34.8 33.9 32 34.3 36.1 35.4 37 35.5
Perth 41.9 41.5 42.4 36 26.9 24.5 23.8 24.3 27.6 30.7 39.8 44.2
Adelaide 42.1 38.1 39.7 33.5 26.3 16.5 21.4 30.4 30.2 34.9 37.1 42.2
Canberra 35.8 29.6 35.1 26.5 22.4 15.3 15.7 21.9 22.1 30.8 33.4 35
Hobart 35.5 34.1 30.7 26 20.9 15.1 17.5 21.7 20.9 24.2 30.1 33.4
Sydney 30.6 29 35.1 27.1 28.6 20.7 23.4 27.7 28.6 34.8 26.4 30.2
If you have to find out from the the table, which Australian city has the highest temperature, then you have to really look through the data. Your eyes need to scan the table, scurrying all the table cells, comparing values, before you can finally answer the question.
On the other hand, using the visualization of the same data (see the figure above), you can easily notice that the light blue line contains the highest temperature of the year. Thus, you can conclude that Perth, the city represented by that line, is the hottest city of the year, without really bother about the rest of the data. You can also almost instantly notice that Darwin’s temperature is historically the most stable one compared to the other cities. This quick observation is hardly possible by just looking at the raw textual data.
Question 1 (warmup)¶
Find out the city with the lowest maximum temperature. First, try to do that with the table. Then, try to do the same using the multi-lines plot in the figure above. Find city with the most stable temperature. Do you think visualization is helpful in drawing your conclusion?
Elements of Visualization¶
All forms of visualization are built with some basic visual elements such as:
Location (x,y coordinate in the screen)
Brightness
Color (Hue)
Pattern/Texture
Shape
Line
Text
Visualization, in principle, transforms the numerical and symbolic data into these basic visual elements. In the previous example, the cities are translated into the colors of the lines and the temperature data is used to plot the location of the lines.
From those simple elements, some popular types of visualization can be built such as:
graphs, representation of a set of relationships between various entities, like family trees, network diagrams, grammar trees
maps, representation of a particular space (and its properties), like geographical map, brain activity map
charts, representation of numerical data either from a given set of real-world data or generated by mathematical functions
In this worksheet, you will mostly learn to generate various types of charts: line plot (line chart), bar chart, pie chart and histogram.
Visualization with Python¶
matplotlib is a Python 2D plotting library that enables you to produce figures and charts, both in a screen or in an image file. You can use the matplotlib’s interactive environment to display figures in your screen if you have installations of Python and matplotlib in your own computer.
The following example demonstrates a simple plot of the fibonacci sequence.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([1,1,2,3,5,8,13]) # a plot of fibonacci sequence
Out[1]:
[
matplotlib allows you to produce plots, histograms, bar charts, pie charts, errorcharts, scatterplots. All these types of graphics shall be clarified as you go through the examples. matplotlib also provides flexibility in customizing those graphics. It permits you to modify the line styles, font properties, axes properties, and many other properties.
The Structure of matplotlib¶
The matplotlib is conceptually divided into three parts. The first part, matplotlib API, is the library that does the hard-work, managing and creating figures, text, lines, plots and so on. In the code above, we access this library by issuing the following command:
>>> import matplotlib
The device dependent backend is the second part. It is the drawing engine that is responsible to render the visual representation to a file or a display device. Example of backends: ‘PS’ backend is used to create postscript file (suitable for hardcopy printout), ‘SVG’ creates scalar vector graphics (SVG file), ‘Tkinter’ on Windows provides interactive interface to the visualization. E.g. One can use the Agg backend to produce a PNG file, as displayed the example above:
>>> matplotlib.use(‘Agg’)
The pyplot interface is the last part of the matplotlib package. Module pyplot provides a set of functions that utilize the underlying matplotlib library. High level visualization functions like plot, boxplot, and bar, are available through pylab interface. To import these functions, issue the following command:
>>> import matplotlib.pyplot as plt
The Powerful Plot¶
Plot is probably the most important function in matplotlib. Plot draws lines and/or markers using coordinates from the multiple points or x,y pairs supplied in the argument of the function. Both x and y are generally list or array of values. For example, the following command plots a simple quadratic function y = x * x.
>>> x = [1,2,3,4]
>>> y = [1,4,9,16]
>>> plot(x,y)
A single list argument to the plot command, like plot(y), would be considered as a list of y-values. matplotlib automatically generates the x-values for you. This is displayed in the next example, which plots the monthly averages of maximum temperature in Melbourne (See table in the beginning of this worksheet).
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import calendar
# Melbourne maximum temperature Jan 2007 – Dec 2008
t = [41.2,35.5,37.4,29.3,23.9,16.8,18.2,25.7,22.3,33.5,36.9,41.1]
s=pd.Series(t)
# print ‘jan’ – ‘dec’
# calendar.month_abbr returns an empty string for the first element
plt.xticks(range(12),list(calendar.month_abbr)[1:])
plt.plot(s)
#plt.show()
#plt.clf()
#plt.plot(s*2)
Out[2]:
[
xticks() is used to annotate the ticks in the x-axis. You need to supply xticks() with a list of x-values and a list of text that go with those values.
Note: In shell script mode, everytime you issue plot() command, the output is added to the results of the earlier plot(). The clf() function can be called to empty the canvas.
The plot function is also commonly used to plot mathematical and scientific formula, shown below.
In [7]:
import numpy as np
np.arange(0.0, 5.0, 0.1)
Out[7]:
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9])
In [8]:
%matplotlib inline
from pylab import *
def f(t):
return cos(2*pi*t)*log(1+t)
precision = 0.1 #
t = arange(0.0, 5.0, precision)
plot(t,f(t),’m’) #’m’is magenta colour
Out[8]:
[
Question 2¶
Modify the example on Melbourne’s maximum temperature to display Sydney’s maximum temperature from April 2007 to November 2007. Have you code load in the temperature data from this file
In [9]:
####### `Question 2 answer:`
Question 3¶
In the mathematical formula example, change the definition of f(t) to sin(2pit)*exp(-t), play around with the variable precision, too. Observe the impact of the changes to the result.
In [6]:
###Answer to question 3
Plot line properties¶
You can supply an optional argument to customize the color and the linestyle of plot output. For example, to plot with red circles, you would issue plot([1,2,3,4],’ro’). ‘r’ represents red color and ‘o’ refers to circle-shaped marker. plot([1,2,3,4],’bs:’) draws blue dotted line with square marker. The line color, the linestyle, and the marker type are respectively given by ‘b’,’:’, and ‘s’. Select your preferred color from a set of matplotlib colors. The choices for linestyle and marker can be seen in the table below.
Line properties
Property Values
alpha The alpha transparency on 0-1 scale
antialiased True or False – use antialiased rendering
color a matplotlib color arg
label a string optionally used for legend
linestyle One of — : -. –
linewidth a float, the line width in points
marker One of + , o . s v x > < ^
markeredgewidth The line width around the marker symbol
markeredgecolor The edge color if a marker is used
markerfacecolor The face color if a marker is used
markersize The size of the marker in points
There are other methods to set plot line properties. First, you can use keyword arguments listed in the table above. For example:
>>> plot(x, y, linewidth=3.0)
Second method uses setp() command. Shown below, setp() allows you to modify multiple properties of a collection of lines.
>>> lines = plot(x1, y1, x2, y2)
>>> setp(lines, color=’b’, linewidth=4.0)
plot() may return a list of lines; eg line1,line2 = plot(x1,y1,x2,x2). Third method utilizes various set functions on the lines returned by a plot command. The list of the set functions are available here.
>>> line1,line2 = plot(x1, y1, x2, y2)
>>> line1.set_antialiased(False) # turn off antialising on the first line
The use of those three methods are demonstrated in the following example:
In [10]:
%matplotlib inline
import matplotlib.pyplot as plt
from numpy import arange
t = arange(0.0, 2.05, 0.05)
# method 1: keyword arguments
plt.plot(t, sin(t*10), ‘k-‘, linewidth=3.0)
# plot a solid black line with thickness=3
# method 2: setp command
lines = plt.plot(t, [4 for i in t], t, 4*t)
plt.setp(lines, ‘color’, ‘r’ )
plt.setp(lines, ‘linestyle’, ‘:’ )
# plot two red dotted lines
# method 3: setp command
line1,line2 = plt.plot(t, t**2, t, exp(t))
line1.set_marker(‘s’)
line1.set_color(‘g’)
line2.set_marker(‘^’)
line2.set_color(‘b’)
# plot two lines
# first line is drawn with green square marker
# second line is drawn with blue triangle marker
Note: As you can see in the result of the example above, the output of multiple plot commands are accumulated. You can use clf() if you like to start with a blank slate.
Question 4¶
Modify the example on Melbourne maximum temperature in the previous section to produce a plot with magenta colored triangle marker. Increase the thickness of the plot line, too.
In [8]:
###Question 4 answer
Adding Text to the Charts¶
You can add labels to the x and y axis of the plot using xlabel() and ylabel(). As you have seen and used earlier, xticks() is used to put text on the x-axis ticks. xticks() needs two arguments: a list of x-values and a list of text that go to those values. The same rules apply to yticks().
Question 5¶
Add the following lines of code to the example on Melbourne maximum temperature. Replace the `xticks()` command with the supplied code. Run to see the effect.
>>> xticks( arange(12), list(calendar.month_abbr)[1:], rotation=40 )
>>> ylabel(“temperature in celcius”, fontsize=14)
>>> xlabel(“months”, fontsize=14)
>>> title(“Melbourne maximum temperature (Dec 07 – Feb 08)”, fontsize=18)
In [9]:
###Question 5 answer
As apparent in the code above, you can supply additional arguments to change the properties of the text. The options for these properties are available in the table below:
Property Values
alpha The alpha transparency on 0-1 scale
color a matplotlib color argument
fontangle italic | normal | oblique
fontname Sans | Helvetica | Courier | Times | Others
fontsize an scalar, eg, 10
fontweight normal | bold | light | 4
horizontalalignment left | center | right
rotation horizontal | vertical
verticalalignment bottom | center | top
You can also attach a legend for the output using the legend() command to describe each line produced by plot(). You need to supply a list of text to describe the respective lines. See the example in the scatter plot section below.
>>> legend(all_species,loc=’lower right’)
Customizing your graphic¶
You can customize your plot result further by using the following commands:
xlim() to set the range of the x axis, for example xlim(0,10)
ylim() to set the range of the y axis, for example ylim(-1,1)
axis() command to do both at the same time. The two commands above is equivalent to axis([0,10,-1,1])
grid(True) to turn on the grid, or grid(False) to do otherwise
Other Charts¶
You’ve seen a number of chart types in this week’s lab, however matplotlib allows you to create many others. Histograms and Pie charts can be particularly useful.
Histogram¶
Histogram displays a distribution of population samples (typically a large set of data like digital images or age of population). The following example creates a histogram of age within a small number of samples (assumes these are the age of your classmates).
In [15]:
%matplotlib inline
import matplotlib.pyplot as plt
ages = [17,18,18,19,21,19,19,21,20,23,19,22,20,21,19,19,14,23,16,17]
plt.hist(ages, bins=10)
plt.axes().yaxis.grid(True, which=’major’)
plt.show()
C:\Users\chris\Anaconda3\lib\site-packages\ipykernel_launcher.py:8: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
Pie chart¶
Pie chart can be used to display proportions of a certain measurement. It is useful to compare the size of a particular slice with the whole pie. The following pie chart illustrates the top-eight carbondioxide emitters. The carbon dioxide emissions data is retrieved from Human Development Reports 2007/2008
In [16]:
%matplotlib inline
import matplotlib.pyplot as plt
world=[“USA”,”China”,”Russia”,”India”,”Japan”,”Germany”,”Canada”,”UK”,”Others”]
co2=[20.90,17.30,5.30,4.60,4.30,2.80,2.20,2.00,40.60]
colors=[‘b’,’g’,’r’,’c’,’m’,’y’,’k’,’w’,’#cccccc’ ]
plt.pie(co2,explode=None,labels=world,colors=colors)
plt.axis(‘equal’)
Out[16]:
(-1.100613401291439,
1.1000292095853066,
-1.1207271401168626,
1.1150661657309902)
In [ ]: