title: Assignment 3 Notes
output:
htmldocument:
toc: yes
r globaloptions, includeFALSE
knitr::optschunksetcollapseTRUE
General Issues
Make sure your file names and file references use identical
spelling, including upperlower case. Your code will fail on a
casesensitive file system if you dont.
Make sure to commit your work to your local repository and push your
commits to GitHub. We can only see what is on GitHub, not what is on
your computer. You can check what we see by going to the GitHub web
interface.
Include your name and the data in the header of your .Rmd file
using author: and date: tags. You can use an inline chunk to
have the date computed when the document is rendered. Your header
should look something like this:
r, include FALSE
rinline functioncode
sprintfr s, code
title: Assignment 3
author: Fred Frog
date: r rinlineSys.Date
output: htmldocument
General Comments
Your HTML file should be a report of your findings.
Any graph you show should be discussed in your narrative.
Any code you show should be discussed in your narrative.
If you do not need to discuss a piece of code in the narrative,
use echo FALSE to avoid showing it.
Your .Rmd file, and possibly supporting .R files, contain the
code for your analysis.
If you need to update your code, or if a collaborator needs to
update your code, that work will be done in your .Rmd file.
You should make sure the code in your .Rmd file is readable.
Following the coding standardscoding.html helps with this.
Please indent by 4 spaces for each level. I find this the most
readable option.
If you read a data file in your code make sure that you read it in a
way that will work for someone else using your repository. If you
want to read from a local file:
Make sure it is available locally either by downloading it as
needed or including it in your repository.
Read the file with a relative path name, assuming your working
directory will be the directory containing your Rmd file.
1. Life Expectancy Distribution by Continent
The subset of the data for years since 1990 can be extracted using the
filter function from dplyr:
r, message FALSE
librarygapminder
librarydplyr
gap1990 filtergapminder, year 1990
A faceted display using ggplot and facetwrap:
r
libraryggplot2
ggplotgap1990, aesx lifeExp geomdensity facetwrapcontinent
2. Boxplots of Life Expectancy by Continent
Boxplots for the same data:
r
ggplotgap1990 geomboxplotaesx continent, y lifeExp
3. Ridgeline Plots of Life Expectancy
r, include FALSE
libraryggridges
Density ridges for the 12 years show that overall life exectancy
distributions have shifted upwards.
r
ggplotgapminder
geomdensityridgesaesx lifeExp, y year, group year
The distribution shape has changed from skewed right in 1952 to skewed
left in 2007. Adding lines at the medians emphasises this shift:
r
ggplotgapminder
geomdensityridgesaesx lifeExp, y year, group year,
quantilelines TRUE, quantiles 2
Separating the distributions by continent shows some striking differences:
r
ggplotmutategapminder, continent reordercontinent, lifeExp
geomdensityridgesaesx lifeExp, y year,
group interactionyear, continent,
fill continent, scale 1.3, alpha 0.8
Life expectancy is highest among European countries, with a steady
increase over the years and consistently low variability among
countries. Variability in life expectancy among the Americas has
decreased and overall levels have increased, but remain below those
for Europe. Life expectancy among countries in Asia has improved
overall, but variability among the countries remains substantially
higher than among European countries. Variability among African
countries has increased, with some at life expectancy levels
comparable to the Americas but the bulk remaining quite a bit lower.
4. Find a Better Visualization
The original:
!imgabcnewstrumptransition.png
Some issues:
The white bars are supposed to represent the numbers, but are not
using a zero base line the bar for Obamas 79 whould be nearly
twice as long as the bar for Trumps 40 .
The blue and red bars are distracting at best, misleading at
worst. They could represent the complementary proportion, but the
lengths are wrong relative to the white bars and to each other.
The placement of the GMA logo adds to the confusion.
A simple bar chart with a zero base line:
r
d data.framepres cObama, Carter, Clinton,
G.W. Bush, Reagan, G.H.W Bush, Trump,
appr c79, 78, 68, 65, 58, 56, 40,
party cD, D, D, R, R, R, R,
year c2009, 1977, 1993, 2001, 1981, 1989, 2017
d mutated, pres reorderpres, appr
p ggplotd, aesx pres, y appr, fill party
geomcol coordflip
p
In recent years it has become common to represent Democrats as blue,
Republicans as red.
The default colors are close to red and blue, but their use is
opposite to current convention.
This can be changed using scalefillmanual:
r
p scalefillmanualvalues cR red, D blue
Pure colors are very intense when used in larger areas.
Pure warm colors, like red, are more intense than pure cool colors,
like blue.
We can reduce the saturation and the value in the HSV color
representation to obtain less intense colors; this is commonly used in
red stateblue state maps:
r
myred hsv0, 0.6, 0.8
myblue hsv2 3, 0.6, 0.8
p scalefillmanualvalues cR myred, D myblue
Some enhancements:
r
p scalefillmanualvalues cR myred, D myblue themevoid
geomtextaesy 3, label pres,
size 8, hjust left, color white
geomtextaesy appr 3, label appr,
size 8, hjust right, color white
Some notes:
A dot chart is a reasonable alternative in this case.
Horizontal bar charts are the norm in these settings since they
allow horizontal labels of reasonable size.
Party is a nominal or categorical attribute, not a numeric attribute.
!
Local Variables:
mode: polymarkdownR
mode: flyspell
End: