程序代做CS代考 ETX2250/ETF5922: Data Visualization and Analytics

ETX2250/ETF5922: Data Visualization and Analytics
Advanced visualization
Lecturer:
Department of Econometrics and Business Statistics 
 Week 5

Visualising many variables
We can do more than visualise variables spatially Colour
Size Label Facets
2/44

An example
0:00 / 4:47
3/44

Mpg data
The variable cty measures fuel eciency of different cars in the city, while displ measures the size of the engine.
These are negatively correlated.
We can also see how the non-metric variable drv interacts with these variables using the col (colour) aesthetic.
4/44

Using color
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=drv))+geom_point()
5/44

Aes v geom
Note that unlike previous lectures, color is being used here to display information about a variable in the dataset.
Therefore instead of specifying color in the geom, it has to be specied in the aes function. Remember the aes function maps data to something we can perceive.
6/44

Text labels
Another option is to plot text rather than points
This is in fact a different geom called geom_text
A variable can be mapped to the actual text that appears The aesthetic is label
7/44

With text
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, label=drv))+geom_text()
8/44

The bubble chart
To add a fourth variable we can manipulate the size of the points. This is known as a bubble chart.
The aesthetic in question is size
The following plot maps the number of cylinders to the size of points.
9/44

Bubble plot
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=drv,size=cyl))+
geom_point()
10/44

All about colourmaps

Color scales
Suppose we are mapping metric or ordinal data to a colormap. The colormap should be Sequential
Perceptually uniform
Work when printed in black and white Accessible to colorblind people Colorful and pretty
The viridis colormap was developed with this in mind
12/44

Jet v Viridis
A popular palette is jet.
A better palette (by the above criteria) is viridis
13/44

Problems with jet
Colors close to one another should be similar.
On jet, in some parts the color changes dramatically over a small range.
Also colorblind people (about 8% of the population) can have diculty with the red colors in jet. For more on this see this talk by the creators of viridis.
14/44

Jet Colormap
knitr::include_graphics(‘images/lecture-05/mona-lisa-rainbow.png’,dpi = 100)
15/44

Viridis colormap
knitr::include_graphics(‘images/lecture-05/mona-lisa-gradient.png’,dpi = 100)
16/44

In ggplot2
Ordered factors now use viridis by default.
ggplot(diamonds,aes(y=price,x=carat,col=cut))+
geom_point(size=0.2)
17/44

Continous color
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+geom_point()
18/44

Continous color
To use viridis for a continous variable simply add scale_color_viridis_c(). Scale is another element of the grammar of graphics.
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_viridis_c()
19/44

Viridis
20/44

Variations on Viridis
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_viridis_c(option = ‘C’)
21/44

Caution
There are some situations where viridis may not be ideal. Nominal variables
Divergent scales
Divergent scales can be used when there is a natural middle point for the data (usually zero).
For when plotting budget or trade balances using color, red can be used to show decit and blue can be used to show surplus.
22/44

Divergent Scale
ggplot(data = mpg,mapping =
aes(x=displ,y=cty, col=hwy))+
geom_point()+scale_color_distiller(type = ‘div’)
23/44

Sometimes we cannot display everything on a single plot
In this case facetting can be used to construct multiple plots
For the next example we look at the Palmer Penguins dataset. You can use the tidytuesdayR package to load this data set in
library(tidytuesdayR)
penguins <- tt_load(2020, week = 31)$penguins ## ## Downloading file 1 of 2: `penguins.csv` ## Downloading file 2 of 2: `penguins_raw.csv` 25/44 Code for facetting ggplot(data = penguins, mapping = aes(x=flipper_length_mm, y=body_mass_g))+ geom_point()+ facet_wrap(~species) Note the tilde (~) in ~species 26/44 Palmer Penguins 27/44 Scales Sometimes if the scales on the y axis are very different, we can't see differences between the facets. The option scales in the facet_wrap function allows each plot to have its own scale. Use this with caution! This is NOT appropriate in this case. Can you see why? 28/44 Free scales ggplot(data = penguins, mapping = aes(x=flipper_length_mm, y=body_mass_g))+ geom_point()+ facet_wrap(~species,scales = 'free_y') 29/44 Palmer Penguins 30/44 Change number of columns The number of rows or columns can be changed with the nrow or ncol arguments ggplot(data = penguins, mapping = aes(x=flipper_length_mm, y=body_mass_g))+ geom_point()+ facet_wrap(~species,scales = 'free_y',ncol = 1) 31/44 Changing number of columns 32/44 Facet grid We can also facet so that the rows correspond to one categorical variable and the columns to another. ggplot(data = penguins, mapping = aes(x=flipper_length_mm, y=body_mass_g))+ geom_point()+ facet_grid(island~species) 33/44 Facet grid 34/44 Your Turn Plot a scatterplot with Bill length on the x axis Bill depth on the y axis Facet by year on the rows Facet by island in the columns Colour by the species 35/44 Solution ggplot(data = penguins, mapping = aes(x=bill_length_mm, y=bill_depth_mm, colour = species))+ geom_point()+ facet_grid(year~island) 36/44 Solution 37/44 Higher Dimensions Pairs plot A pairs plot gives an array of plots On the diagonal there are kernel densities or barplots On the lower diagonal are scatterplots or facetted histograms On the upper diagonal are correlations or boxplots. This can be implemented using the ggpairs function in the GGally package. 39/44 Palmer data library(GGally) ggpairs(penguins) 40/44 Correlation plot ggcorr(penguins) 41/44 Parallel Coordinates A parallel coordinates plots the variables of all values along the y axis. The variables themselves appear along the x axis. Values corresponding to the same observation are joined up by lines. They can often look messy but sometimes provide insight. 42/44 Parallel Coordinates ggparcoord(penguins) 43/44 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Lecturer: Department of Econometrics and Business Statistics   Week 5