STAT240 D100 Spring 2021 SFU
Midterm
This midterm exam consists of 3 problems. All aspects of the midterm exam must be handed in through crowdmark. This midterm exam is open book and take home and due March 5th at 6PM PST. You may access any texts, notes or lectures while completing this midterm exam. You may access resources on the internet provided that you don’t communicate any aspect of this midterm exam, distribute the midterm exam material in any way or confer with other students or third parties regarding the midterm exam; as formalized below.
Honour Code
In taking this midterm exam you are required to affirm your willingness to abide with the course policies. By entering your name below, you affirm that you are abiding by the following honour code: I understand that the following activities are prohibited and will be considered cheating. I agree that I will not participate in any of the following activities:
• Looking at or copying from another student’s midterm exam or mate- rials while writing the midterm exam.
• Conferring with other students or other parties regarding the midterm exam.
• Having someone else take the midterm exam in your place.
• Distributing the midterm exam materials in any way, or discussing
midterm exam materials with anyone in any form or media.
• Misrepresenting the considerations that the midterm exam must be done within the time limitation.
1 of 6
STAT240 D100 Spring 2021 SFU
The above honour code is an undertaking for students to abide by both individually and collectively. You must uphold both the spirit and letter of this honour code. Please sign this honour code and upload to crowdmark.
Signature:
Full Name (printed): Student Number:
2 of 6
STAT240 D100 Spring 2021 SFU
Question 1: Essay
Has data science changed the world? Write a roughly 800 word (before referneces) argumentative essay exploring this question. Your essay should have an introduction, a body and a conclusion. Your essay should have a title and a thesis statement. (A thesis statement is one or two sentences that provides a concise and focused summary of the argument that will be made in the body.) Your thesis statement should appear in the introduction in italics. The essay should be double spaced, but all other aspects of the typesetting are up to you (inclusion of section markers, capitalization of title, style for references …). Please include your name under the title. Provide references for any material you draw on in your argument, or for any facts you report. (You may reference blog posts, you may not reference wikis.) The marking scheme is as follows: 40% quality of argumentation, 40% quality of research and originality, 10% flow and quality of writing. Upload a pdf version of your essay to crowdmark.
(10 points)
Question 2: Databases
Consider the database in the file stat240.sqlite provided in this midterm archive. This database contains a table named citiesA containing the area of cities and a table named citiesB containing the population of cities. This question has 4 parts, which must all be completed. For each part, provide code and output in a single pdf file through crowdmark. Provide axis-lables and titles for all of your plots.
Question 2, Part I
Connect to the database in the file stat240.sqlite and output the names of the tables in the database. For each table, output the names of the columns of the table and the data types of the columns and the number of entries in the table.
3 of 6
(3 points)
STAT240 D100 Spring 2021 SFU
Question 2, Part II
Use SQL to extract the unique combinations of province and type from the citiesP table, and provide the number of such unique combinations.
(2 points)
Question 2, Part III
Use SQL to obtain the number of municipalities within each province in the database. Restrict to location names that are present in both citiesA and citiesP tables. Provide a plot of the number of resulting municipalities from each province.
(5 points)
Question 2, Part IV
For each location in the citiesP table, the columns Order2011 and Or- der2016 represent the popularity of the destination with tourists in the years 2011 and 2016 respectively (the tourist rank orders). Extract the tourist rank order for 2011 (Order2011) and for 2016 (Order2016) for each location in the citiesP table. Provide a scatter plot of the 2011 values against the 2016 values (i.e., a plot with one point per location and the 2011 values on the y-axis and the 2016 values on the x-axis).
(5 points)
Question 3: Project
Consider the Best New Albums section of the popular music site Pitchfork. Write an R function to retrieve the text of a review of an album, and also the rating that the reviewer gives to the album (on the scale 0.0 to 10.0). Provide a function named pitchfork which takes a single parameter: a URL specified by a string. When this function is passed the URL to a review of an album in the Best New Albums section of Pitchfork, the pitchfork function should return a list with two elements: 1) an element named text with value given by the body of the review as a character string (with all
4 of 6
STAT240 D100 Spring 2021 SFU
HTML removed), 2) an element named rating with value given by the dec- imal value of the review, as a numeric. You can find URLs for these reviews by navigating to https://pitchfork.com/reviews/best/albums/ and right clicking on an album and then selecting something like Copy Link. Example usage follows. In this listing, the ellipsis indicates cropped R out- put.
> url = ’https://pitchfork.com/reviews/albums/jeff-
parker-suite-for-max-brown/’
> review = pitchfork(url)
> review$text
[1] “Jeff Parker always always writes parts that sound
unassuming at first listen and unavoidable by the
fifth.
…
And together, with the help of some friends, they
build a nest from which to watch the world.”
> review$rating
[1] 8.4
Question 3: Part I
Provide a pdf containing all of the code that you wrote for this project. If your code spans multiple files, put all of the code into a single pdf and use comments to indicate the beginning of the files and the file names. You may use use code from outside sources in your project provided that it is released under an open license and provided that you clearly indicate the code with comments.
5 of 6
(10 points)
STAT240 D100 Spring 2021 SFU
Question 3: Part II
Write a short report (in prose) arguing that the code you’ve provided works as required and provide a pdf of the report. Include output of R sessions in labelled figures. Provide some example calls of your implementation of the pitchfork function.
6 of 6
(10 points)