title: Assignment 2 Notes
output:
htmldocument:
toc: yes
r globaloptions, includeFALSE
knitr::optschunksetcollapseTRUE
Files and Folders
It is essential that you name your folders and files exactly as
specified. We run checks like
shell
cd HW2
Rscript e rmarkdown::renderhw2.Rmd
from the top of a clone of your repository. If the folders
and files are not named exactly as specified these checks will fail
and your work will not be graded.
Rmarkdown Usage and Coding Style
Make sure you are using Rmarkdown properly, with explanatory texts
surrounding short code chunks. In particular you should not have just
one big code chunk.
Your rendered HTML page should be a report with text supporting
numerical and graphical results. Code only needs to be visible if you
are explaining how to do something which is a goal of the class notes.
Your Rmarkdown code and your R code should be readable, and the R code
should follow the coding standards. This makes maintaining your code
and document easier.
Name and Date
Make sure your Rmarkdown file header contains a name: field with
your name. A a date: field with an appropriate date is also helpful.
Your header should look something like this:
title: HW1
output: htmldocument
name: Your Name
date: February 1, 2019
You can also use one of these as the date line to produce the current date
when the document is knit:
r, include FALSE
rinline functioncode
sprintfr s, code
date: r rinlineSys.Date
date: r rinlineformatSys.Date, B e, Y
Handling Data Files
If your Rmarkdown document makes use of an external data file you need
to make sure it can be accessed when someone you give your repository
to renders your file. There are several options:
Include the file in your repository. You can then reference it from
your code with a relative path, relative to the location of your
Rmd file. This is reasonable for small data sets and it freezes
the data at its current state.
Access the file using a URL. This will load the file over the network
each time you render your document. You need a network connection,
and you may cause unnecessary traffic if the file is large.
Check if you have the file locally, and download it if you do
not. This is often a good option and there are several example of
how to do this in the notes.
Relying on retrieving a file from the network means it may change or
be removed. In some cases this will be what you want, in others maybe
not.
3. Find a Better Visualization
The use of a nonzero baseline in the visualization
! http:www.statisticshowto.commisleadinggraphs
!imgusatodaywelfare.png
is misleading since the viewers attention is drawn to the length of
the bars, which suggest a much larger relative change than actually
present in the data. Using a zero baseline accurately reflects the
relative changes:
r, echo FALSE
url http:homepage.divms.uiowa.edulukeclassesSTAT4580hw2welfare.csv
welfare read.csvurl
libraryggplot2
ggplotwelfare, aesx quarter, y onAssistance
geombarstat identity, width 0.2
You could also use a dot plot or a line plot. For a dot plot starting
the value axis at the origin is not as important, but doing so still
makes sense as the comparison of primary interest is the relative
change ratio data.
Whether the numbers themselves make sense is also worth considering:
the values seem very high.
4. Average Life Expectancies
r, include FALSE
librarygapminder
librarydplyr
The subset of the data for years since 1990 can be extracted with the
base function subset or with the filter function from dplyr:
r
gap1990 filtergapminder, year 1990
The dplyr functions groupby and summarize can then be used to
compute the average life expectancies for each continent:
r
s summarizegroupbygap1990, continent,
avglifeExp meanlifeExp
One way to display this nicely in an Rmarkdown document is to use
kable from the knitr package:
r
knitr::kables, digits 2
The kableExtra
packagehttps:cloud.rproject.orgwebpackageskableExtravignettesawesometableinhtml.html
allows table formatting to be customized in lots of ways. There are a
number of other packages for making nicelooking tables.
A dot plot and a bar chart:
r, message FALSE
librarygridExtra
p1 ggplots geompointaesx avglifeExp, y continent
p2 ggplots
geombaraesx continent, y avglifeExp,
stat identity, width 0.2
coordflip
grid.arrangep1, p2, nrow 1
The bar chart emphasizes the ratio comparisons: average life
expectancy for Africa is about 23 of the value for Oceania; the
relative differences among Asia, Europe, the Americas, and Oceania are
much smaller.
Without Africa and Oceania the relative differences are small and a
common way to express comparisons is to say that average life
expectancy in Europe is about 4 years higher than in the Americas.
This comparison is made easier by a dot chart.
!
Local Variables:
mode: markdown
mode: flyspell
End: