2/8/2018 a5 – Data Report
a5 – Data Report
Due Friday by 11:59pm Points 100 Submitting a text entry box
In this assignment, you’ll write an R script that generates a reportdescribing an online data set. This will require you to deeply engage with a web API and it’s problem domain, as well as create Markdown documents using knitr .
In particular, your report will address the following hypothetical request for information:
What kind of data is available about the laws that the United States Congress is creating? How can I find out more about these laws? And what are the people I voted for even doing?
Your report will address these questions by drawing on the ProPublica Congress API ; your report will demonstrate how that data source specifically can be used to address the request (though you may need to refer to other external resources as well).
Unlike previous assignments, this assignment is largely open-ended. It will necessitate a larger amount of critical thinking skills, as well as likely some outside research to understand the problem domain. This assignment requires you to reflect and write about your problem solving process, so be prepared to explain your work!
Objectives
By completing this assignment you will practice and master the following skills:
Requesting data from a publicly available web API
Reading API documentation to learn how to access specific data Researching and learning about a specific problem domain Rendering R Markdown files using knitr
Practice wrangling and manipulating data using multiple script files
Setup
Follow the below link to create your private code repo for this assignment. You will need to accept this assignment to create your code repo. Do not fork this repository!
https://classroom.github.com/a/7wlsvCMR
Do not fork this repository!
You will need to accept this assignment to create your code repo. This repo will have the name info201a-w18/a5-data-report-yourusername , and you can view it online
Submit Assignment
https://canvas.uw.edu/courses/1128814/assignments/3973813
1/8
2/8/2018 a5 – Data Report
at (replacing with your GitHub user name).
After you’ve accepted the assignment, clone the repo to your local machine so you can edit (create) the files. Make sure you don’t clone it inside another repo!
Unlike previous assignments, the repo contains no starter code; you will need to generate the script files yourself (see below for details).
Remember that it is always a good idea to add and commit (and even push ) your changes whenever you finish a section of an assignment or project!
The API
Your report will need to utilize data drawn from the ProPublica Congress API. Full details about this API can be found on its website at https://projects.propublica.org/api-docs/congress-api/
; note that there are different pages documenting the different endpoints that you will need to query for your report.
Since you will be requesting from multiple endpoints, you should save the base uri as a single variable, and then paste the relevant endpoint to that variable for each query. This will avoid code duplication and redundancy.
Requesting data from this API requires a free API Key (so that the hosts can regulate who is using the service). You will need to sign up for a key
at https://www.propublica.org/datastore/api/propublica-congress-api .
As detailed in the course book , you should assign this key to a variable in a separate “key script” file (e.g., apikeys.R ), which you can then source() into your “main” program script in order to make that variable available.
Also be sure and add the filename for your key script as an entry in the
provided file, so that you don’t commit your api key by accident. You can open the file in Atom or even RStudio (it’s hidden, so you can open the entire folder or use the command-line), add your “api key script” filename as another line at the bottom, and then save the file. When you type git status , your key script should not be listed as a file to be committed!
Do not commit your key script!
While most APIs will have you specify an API key as a query parameter, ProPublica instead requires you to include that key as part of the header of your request (think: you write it on the envelope, but it’s not in the mailing address). You can add the key to a request header sent using with the add_headers() function, which is passed as an argument to
the function:
response <- GET(resource.uri, add_headers(‘X-API-Key’ = propublica.key))
You can inspect the response to confirm that your request was received and authorized.
And as always, you will need to extract the body ( content ) of the request’s response
as “text” , then use the jsonlite library to convert that text from JSON into R data. Be sure and check the schema of the data that is returned to you!
https://github.com/info201a-w18/a5-data-report-yourusername
.gitignore
httr
.gitignore
GET()
yourusername
https://canvas.uw.edu/courses/1128814/assignments/3973813
2/8
2/8/2018 a5 – Data Report
Creating The Report
You should author your report as an R Markdown file using knitr . You will need to create a file called index.Rmd inside your repo (you can do this through the RStudio Wizard). This file will contain your report, including both text in Markdown and R code that will be executed to generate the data shown in the report.
Your report should include metadata specifying an appropriate title (“Legislative Data” is fine), your name as the author, and the date the report was generated.
Include a brief paragraph (in Markdown) introducing your report, and then the individual sections described below. You should also include a “references” section at the bottom with links to any resources you utilized.
Your report should use text blocks appropriately to organize it. For example, each section should be labeled with a heading (second-level, since they aren’t document titles). Judicious use of lists will also be useful.
It’s also possible to include footnotes if you wish to use those for citations.
Throughout you report, you should use blocks of R code to generate, access, and display specific data. Note that these blocks should almost certainly be executed but not rendered. You can also use functions such as kable() to easily include tables of data.
Also use inline code blocks to include variables in your text, allowing you to easily change it for updated data.
PRO TIP: It’s a wonderful idea to do all of your R-based data access and manipulation in a separate .R file, which you can then source() into the .Rmd file. This makes it easier to test your work since you don’t need to re-knit the document each time, as well as keeping the data processing and your written content separate. You can assign the relevant data to specific variables, or even define a function that you can call from
the .Rmd in order to generate and format the needed data.
You can use the built-in Knit button in RStudio to render your .Rmd file into a .html file which you can open with a web browser. Simply click the Knit button at the top of RStudio, and your index.html file should be saved in the same directory as your .Rmd file. You will likely want to do this repeatedly as you work through the assignment to make sure everything works!
Section 1: Legislative Data Overview
The first section of your report should describe what kinds of data is available about congressional laws (bills) and procedures (votes). The purpose of this section is to present components that could be included in a data analysis. Your explanation should be targeted at non-programmers.
Some of these requirements are intentionally unspecific (e.g., about “which bills”). That is because these choices are up to you! Just be sure and document your choices so it is clear what data you are presenting (e.g., “this data is for the Senate”).
https://canvas.uw.edu/courses/1128814/assignments/3973813
3/8
2/8/2018 a5 – Data Report
This section should include an example of the data for the most recent bills (e.g., presented as a table). Your example shouldn’t include all of the data available for those bills, just the most important values. For example, you should include:
A unique identifier for the bill, to distinguish it from others. Your report should also explain how these identifiers are produced and how they may be used.
The name of the bill of course.
The legislator(s) involved with the bill. Note that information like political party and home state is very important for identifying legislators.
The current status of the bill. How has the legislature interacted with it? Be sure and explain any values you include in your table.
Where to find more information about the bill (such as its complete text). This should be a link included in your table. Also explain where the links go to.
Your report should include a (Markdown) list of column explanations, so that the reader knows how to interpret the table.
This section should also include an example of the data available for the 5 most
recent votes (again presented as a table). Similar to with the bills, your example should include only the most prominent values. For example, you should include:
A unique identifier for the vote, to distinguish it from others. Your report should also explain how these identifiers are produced and how they may be used.
What the vote was about (e.g., was it on a bill or something else?)
What kind of vote was it? Were there any special rules about what was needed to “win”? You should also provide a listing and explanation of all of the different kinds of votes that were made recently, even if not in the 5 most recent.
What were the results of the vote? Describe what is the default (and thus most important) information available about who voted got what.
Where to find more information about the vote (such as a complete breakdown). This should again be a link included in your table. Also explain where the links go to.
Again, the purpose here is to give an overview and explanation of what information is available through the API. Be sure and include references and citations for any resources you consult when writing your explanations.
Section 2: Specific Legislation
In this section, you should provide a “worked example” example of locating and analyzing a specific piece of legislation (a single bill) to demonstrate how the API can be used for exploratory data analysis.
Throughout this section, you should use Markdown text to explain your process. This is not about the code you write (which should not be included), but about the steps you’ve taken to identify and understand this particular piece of data. Think of it as a “journal” of your analysis. It should be read as a “walkthrough” of your work. Be sure to include any questions you had (and how you found answers), as well as choices you made.
First, you should use the ProPublica API to find bills related to a specific topic of interest to you. The exact choice of topic is up to you. “data” is a fine default, but other current issues (e.g., “immigration” , “climate” , etc.) are also acceptable.
5
https://canvas.uw.edu/courses/1128814/assignments/3973813
4/8
2/8/2018 a5 – Data Report
Your report should include a list (or table) of the names and unique identifiers of 3 recent bills on this topic, as a demonstration of your searching.
Next, you should pick one of the bills on your topic found in your search. You can pick whichever bill you want, but for more interesting results you might choose one that is that has been voted on (and possibly even passed into law!)
Your report should include a text description of this chosen bill (in a paragraph or two). Your text should include at least the following information:
What bill are you describing? You should include any relevant details (e.g., sponsors, year) using inline R code in case they change later. Be sure and include a link to the bill’s full text.
What is this bill about? Provide your own description of the bill’s purpose and context. This will almost certainly require you to do some outside research to understand the full effects of the bill, the factors that let to it, its (expected) impacts, etc. Remember to cite/footnote your sources!
Also include a list or table of recent “related” bills. This will help your report’s content stay “fresh” even as your specific bill becomes out-dated.
This section will be graded on how well you’ve explained and demonstrated your process for learning and explaining the problem domain that this bill exists within.
Section 3: Representative Actions
In this final section, you will include information about the actions taken by a specific legislator (in either the House or the Senate) who represents you or the place where you live—so you know what they are up to and how to contact them and complain.
First, you should identify (name) your current representatives. You can look up this information using any source you wish (including ProPublica’s API)—just include where you found the information. You will then need to pick one of your representatives to present details on.
Your script should contain a variable (e.g., representative.id ) that uniquely identifies your representative. That way you can easily change the information in the report just by altering a single variable (if, for example, you wanted to talk about a different person instead).
Your report should then present the following information (dynamically drawn from ProPublica’s API):
Their contact information (phone number and Twitter handle at least) so you can hold them accountable.
A list or table of bills they have sponsored or cosponsored so you know what they have been trying to accomplish.
The percentage of recent votes in which they voted with the majority of their party (Republican or Democrat).
(This is the most challenging data work in the assignment; you will need to do
some dplyr processing. A hint: note that you can query two endpoints and then join the data frames together to consider information from both. Do not try and use a loop to send multiple HTTP requests!)
https://canvas.uw.edu/courses/1128814/assignments/3973813
5/8
2/8/2018 a5 – Data Report
Note that all of this information should be written in your report in “prose” (e.g., as full sentences explaining the information to the reader). Do not simply include tables of data drawn from the API. Think about the user experience of reading the report.
Submit Your Solution
In order to submit this assignment
1. Confirm that you’ve successfully completed the assignment (e.g., that your code is able to generate a report, and re-running the code will produce updated information).
2.
and the final version of your work (including the generated index.html file), your code to your GitHub repository.
Please proofread your report! Make sure there aren’t any half finished sentences or egregious typos, and that overall it is cleanly formatted and readable. It should be in better condition than these assignment write-ups!
add
and
commit
push
- Make sure that you’ve filled out the Submission form for this assignment, answering the questions in detail.
- Submit the URL of your GitHub Repository as your assignment submission on Canvas (this page, at the top).
a5 – Data Report Rubric
https://canvas.uw.edu/courses/1128814/assignments/3973813
6/8
2/8/2018 a5 – Data Report
Criteria |
Ratings |
Pts |
Creating the Report You have created a report using R Markdown. Your report is well-structured with appropriate meta-data and references. The report uses R code to dynamically specify its content. You have correctly managed any API keys. You have knit the report into HTML. |
15.0 pts |
|
Legislative Data Overview Your report details the types of data available about congressional laws and procedures. You have included tables showing recent bills and votes, and provided clear explanations for the meaning and purpose of API provided data values. |
27.0 pts |
|
Specific Legislation Your report includes details about a specific piece of legislation: including the search that let to it, the specific details of that bill (dynamically drawn from the API), a prose description of its context, and a list of related bills. |
23.0 pts |
|
Representative Actions Your report includes information about a specific legislator, including their contact information, sponsored legislation, and alignment with political party. |
25.0 pts |
|
Code Clarity and README Code is well written and documented. You have effectively utilized and structured your API query requests (including the use of query parameters). You have effectively used variables to abstract your data processing and enable reporting on alternate legislation or legislators. Code blocks, white space, and variable names are properly used so that code is clear and readable. You have included sufficient comments to explain your analysis. You have filled out the Submission questionnaire. |
10.0 pts |
https://canvas.uw.edu/courses/1128814/assignments/3973813
7/8
2/8/2018 a5 – Data Report
Criteria
Ratings
Pts
Total Points: 100.0
https://canvas.uw.edu/courses/1128814/assignments/3973813
8/8