ETX2250/ETF5922: Data Visualization and Analytics
Advanced visualization
Lecturer:
Department of Econometrics and Business Statistics
Week 5
Visualising many variables
We can do more than visualise variables spatially Colour
Size Label Facets
2/44
An example
0:00 / 4:47
3/44
Mpg data
The variable cty measures fuel e ciency of different cars in the city, while displ measures the size of the engine.
These are negatively correlated.
We can also see how the non-metric variable drv interacts with these variables using the col (colour) aesthetic.
4/44
Using color
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=drv))+geom_point()
5/44
Aes v geom
Note that unlike previous lectures, color is being used here to display information about a variable in the dataset.
Therefore instead of specifying color in the geom, it has to be speci ed in the aes function. Remember the aes function maps data to something we can perceive.
6/44
Text labels
Another option is to plot text rather than points
This is in fact a different geom called geom_text
A variable can be mapped to the actual text that appears The aesthetic is label
7/44
With text
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, label=drv))+geom_text()
8/44
The bubble chart
To add a fourth variable we can manipulate the size of the points. This is known as a bubble chart.
The aesthetic in question is size
The following plot maps the number of cylinders to the size of points.
9/44
Bubble plot
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=drv,size=cyl))+
geom_point()
10/44
All about colourmaps
Color scales
Suppose we are mapping metric or ordinal data to a colormap. The colormap should be Sequential
Perceptually uniform
Work when printed in black and white Accessible to colorblind people Colorful and pretty
The viridis colormap was developed with this in mind
12/44
Jet v Viridis
A popular palette is jet.
A better palette (by the above criteria) is viridis
13/44
Problems with jet
Colors close to one another should be similar.
On jet, in some parts the color changes dramatically over a small range.
Also colorblind people (about 8% of the population) can have di culty with the red colors in jet. For more on this see this talk by the creators of viridis.
14/44
Jet Colormap
knitr::include_graphics(‘images/lecture-05/mona-lisa-rainbow.png’,dpi = 100)
15/44
Viridis colormap
knitr::include_graphics(‘images/lecture-05/mona-lisa-gradient.png’,dpi = 100)
16/44
In ggplot2
Ordered factors now use viridis by default.
ggplot(diamonds,aes(y=price,x=carat,col=cut))+
geom_point(size=0.2)
17/44
Continous color
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+geom_point()
18/44
Continous color
To use viridis for a continous variable simply add scale_color_viridis_c(). Scale is another element of the grammar of graphics.
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_viridis_c()
19/44
Viridis
20/44
Variations on Viridis
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_viridis_c(option = ‘C’)
21/44
Caution
There are some situations where viridis may not be ideal. Nominal variables
Divergent scales
Divergent scales can be used when there is a natural middle point for the data (usually zero).
For when plotting budget or trade balances using color, red can be used to show de cit and blue can be used to show surplus.
22/44
Divergent Scale
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_distiller(type = ‘div’)
23/44
Sometimes we cannot display everything on a single plot
In this case facetting can be used to construct multiple plots
For the next example we look at the Palmer Penguins dataset. You can use the tidytuesdayR package to load this data set in
library(tidytuesdayR)
penguins <- tt_load(2020, week = 31)$penguins
##
## Downloading file 1 of 2: `penguins.csv`
## Downloading file 2 of 2: `penguins_raw.csv`
25/44
Code for facetting
ggplot(data = penguins,
mapping = aes(x=flipper_length_mm, y=body_mass_g))+
geom_point()+
facet_wrap(~species)
Note the tilde (~) in ~species
26/44
Palmer Penguins
27/44
Scales
Sometimes if the scales on the y axis are very different, we can't see differences between the facets. The option scales in the facet_wrap function allows each plot to have its own scale.
Use this with caution! This is NOT appropriate in this case. Can you see why?
28/44
Free scales
ggplot(data = penguins,
mapping = aes(x=flipper_length_mm, y=body_mass_g))+
geom_point()+
facet_wrap(~species,scales = 'free_y')
29/44
Palmer Penguins
30/44
Change number of columns
The number of rows or columns can be changed with the nrow or ncol arguments
ggplot(data = penguins,
mapping = aes(x=flipper_length_mm, y=body_mass_g))+
geom_point()+
facet_wrap(~species,scales = 'free_y',ncol = 1)
31/44
Changing number of columns
32/44
Facet grid
We can also facet so that the rows correspond to one categorical variable and the columns to another.
ggplot(data = penguins,
mapping = aes(x=flipper_length_mm, y=body_mass_g))+
geom_point()+
facet_grid(island~species)
33/44
Facet grid
34/44
Your Turn
Plot a scatterplot with
Bill length on the x axis
Bill depth on the y axis
Facet by year on the rows Facet by island in the columns Colour by the species
35/44
Solution
ggplot(data = penguins,
mapping = aes(x=bill_length_mm, y=bill_depth_mm, colour = species))+
geom_point()+
facet_grid(year~island)
36/44
Solution
37/44
Higher Dimensions
Pairs plot
A pairs plot gives an array of plots
On the diagonal there are kernel densities or barplots
On the lower diagonal are scatterplots or facetted histograms On the upper diagonal are correlations or boxplots.
This can be implemented using the ggpairs function in the GGally package.
39/44
Palmer data
library(GGally) ggpairs(penguins)
40/44
Correlation plot
ggcorr(penguins)
41/44
Parallel Coordinates
A parallel coordinates plots the variables of all values along the y axis. The variables themselves appear along the x axis.
Values corresponding to the same observation are joined up by lines. They can often look messy but sometimes provide insight.
42/44
Parallel Coordinates
ggparcoord(penguins)
43/44
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer:
Department of Econometrics and Business Statistics
Week 5