Statistics 260: Introductory R for Data Science – Lecture 1: Introduction to R and Getting Started
Statistics 260: Introductory R for Data Science
Lecture 1: Introduction to R and Getting Started
David Stenning
1 / 26
Admin
Introducing R
R for data science
Course objective
Getting started
2 / 26
Admin
3 / 26
Course Information
Lectures (STAT 260):
I Tu 12:30 PM — 2:20 PM (Vancouver Time). I will give a 10
min break from roughly 1:20 PM—1:30 PM.)
Labs (STAT 261):
I There are 7 STAT 261 Lab sections listed as D100—D700.
I To control class size and create the best learning environment
possible given the circumstances, you may not attend sections
of STAT 261 for which you are not registered.
I You should have received a Canvas announcement with Zoom
links for your STAT 261 Lab and TA office hours.
Instructor office hours:
I Fr 3:00PM – 4:00 PM (Vancouver Time)
I By appointment when possible
4 / 26
Admin
I I will use a blend of synchronous and asynchronous methods to
deliver course material. Students are responsible for all material
covered in synchronous lectures, pre-recorded videos, assigned
readings, and exercises and labs, among others.
I Addition information regarding method of teaching and
assessment is given at the “Admin page” on Canvas:
https://canvas.sfu.ca/courses/66321/pages/admin.
I Please see also the “Admin page” for STAT 261, the laboratory
component of this course:
https://canvas.sfu.ca/courses/66322/pages/admin.
I I will do my best to answer any inquiries outside of class time
or office hours within one business day. Please feel free to send
a follow-up message after two business days if you are
expecting a reply.
5 / 26
https://canvas.sfu.ca/courses/66321/pages/admin
https://canvas.sfu.ca/courses/66322/pages/admin
Admin
I Note that you must be enrolled in in both STAT 260 (lectures)
and STAT 261 (lab section).
I Because STAT 260 and STAT 261 are different course
numbers, you will get a (potentially different) grade for each.
However, you will not learn new content in the STAT 261 Labs,
but rather reinforce the concepts from STAT 260.
I The Lab exercises tend to be more involved and challenging
than the Lecture Exercises.
I This also means that the Final Exam for STAT 260 may involve
content from the STAT 261 Labs, in the form of exam
questions being similar to Lab Exercises.
6 / 26
Grading
I Both STAT 260 and STAT 261 have (different) weekly quizzes.
I That is, you will have two quizzes per week: one towards the
end of the STAT 260 Lecture and one towards the end of the
STAT 261 Lab section in which you are enrolled.
I The lowest quiz score from STAT 260 and the lowest quiz score
from STAT 261 will be dropped (not the two lowest overall).
Further, the first quiz in each course, to be taken next week,
will be for practice and will not count towards your final grade.
I Note that STAT 260 is worth 2 credits and STAT 261 is worth
1 credit. Their final grades will be determined as follows:
I STAT 260 (2 credits): 50% weekly quizzes and 50% final exam
I STAT 261 (1 credit): 100% weekly quizzes
7 / 26
Environment and academic integrity
I I take academic integrity very seriously and have a
zero-tolerance policy on cheating. You may refer to SFU’s
Academic Integrity website for more information (link).
I In addition, the usual codes of conduct apply, such as avoiding
disruptions during lectures and labs and, in general, respecting
others. This includes in zoom chats, discussion forums (e.g., in
Canvas), and other interactions with your fellow students.
I You may (and are encouraged to!) discuss lecture or lab
exercises with one another prior to lecture/lab, but you may not
work together or share any materials during quizzes or exams.
I If you have doubts about whether certain activities go against
academic integrity, please ask me!
8 / 26
http://www.sfu.ca/students/academicintegrity.html
Introducing R
9 / 26
What is R?
I R is an open-source environment for statistical computing and
graphics.
I An implementation of the S language from Bell Labs
(https://en.wikipedia.org/wiki/S_(programming_language))
I Started in the mid-1990’s by Ross Ihaka and Robert Gentlemen
at Auckland University
I Now maintained by a team of experts called the R
Development Core Team
I A “packages” system allows any user to bundle R code, data
and examples together.
I R and R packages are distributed through the Comprehensive R
Archive Network (CRAN).
I SFU has a CRAN mirror at http://cran.stat.sfu.ca
10 / 26
https://en.wikipedia.org/wiki/S_(programming_language)
http://cran.stat.sfu.ca
What does “environment” mean?
I R is a fully-functioning programming environment with all the
usual constructs, such as
I conditionals (if-then-else),
I loops,
I user-defined functions.
I In addition there are built-in facilities for
I data input, storage, manipulation, and output,
I optimization, matrix computation, etc.,
I random number generation,
I data analysis and graphics.
I “Base” R is good, but it is the package system that makes R
especially useful for data science.
11 / 26
R packages
I The R package system is the key to the success of R.
I It has allowed statisticians and other data scientists to
implement and distribute their work to be used by others.
I The R package system enforces some rules about how packages
are structured, but differences in programming styles of
package authors mean different interfaces.
I Users need to be aware of different data structures for input and
output, and of different styles for graphics.
12 / 26
R for data science
13 / 26
R vs. Python
14 / 26
R vs. Python
15 / 26
R vs. Python
16 / 26
R and Data Visualization
COVID-19 data visualization from the New York Times, Sept 11,
2021
17 / 26
R and Data Visualization
Data visualization by Dylan Anderson, Policy in Numbers
(https://www.policyinnumbers.com/blog/2021/03/08/visualizing-the-proportion-of-women-in-governments-
around-the-world/)
18 / 26
https://www.policyinnumbers.com/blog/2021/03/08/visualizing-the-proportion-of-women-in-governments-around-the-world/
https://www.policyinnumbers.com/blog/2021/03/08/visualizing-the-proportion-of-women-in-governments-around-the-world/
R in Industry
Some sponsors of the useR! 2016 conference held at Stanford
University, Stanford, CA.
I R is also used by Facebook, Ford Motor Company, John Deere,
Mozilla, the NY Times, Twitter, Trulia, and ANZ bank, among
others (from: https://data-flair.training/blogs/r-applications/)
19 / 26
https://data-flair.training/blogs/r-applications/
Course objective
20 / 26
Course objective
I Learn how to use R for common tasks in data science, with a
focus on tools from the “tidyverse”.
I https://www.tidyverse.org/
21 / 26
https://www.tidyverse.org/
Getting started
22 / 26
Getting started with R and RStudio
I A brief set of instructions are on the Canvas page
https://canvas.sfu.ca/courses/66321/pages/getting-started-
with-r-and-rstudio
I Please try to get R and RStudio installed as soon as possible
(i.e., in the remaining lecture time after I finish this lecture).
I Those still having difficulty can ask for help in their STAT 261
labs this week.
23 / 26
https://canvas.sfu.ca/courses/66321/pages/getting-started-with-r-and-rstudio
https://canvas.sfu.ca/courses/66321/pages/getting-started-with-r-and-rstudio
R reference cards and further help
I RStudio has created several useful reference cards, called cheat
sheets, and have collected cheat sheets from other R users.
I See https://www.rstudio.com/resources/cheatsheets/
I For getting started with RStudio you might find the following
helpful: https:
//github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
I For specific queries you may find Stack Overflow helpful
(someone may have had the same question!):
https://stackoverflow.com/
I Of course, Google is also helpful and a good place to start if
you get stuck. (Try adding “R” after your query.)
24 / 26
https://www.rstudio.com/resources/cheatsheets/
https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
https://github.com/rstudio/cheatsheets/raw/master/rstudio-ide.pdf
https://stackoverflow.com/
Starting R
I Start R by starting RStudio.
I The “Console” window is where you can type your commands.
I However, it is good practice to open an R script, type your
commands in the script, and then submit the commands to the
R console.
I Session -> Set Working Directory to set the working
directory
I File -> New File -> R Script to open a new R script
I type your commands into the script
I put your cursor on the line you want to submit and hit
Ctrl-enter
I Save your script for later use.
I More on the RStudio interface at
https://support.rstudio.com/hc/en-us/sections/200107586-
Using-RStudio
25 / 26
https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio
https://support.rstudio.com/hc/en-us/sections/200107586-Using-RStudio
Reading
I Chapter 1 of the text:
https://r4ds.had.co.nz/introduction.html
26 / 26
https://r4ds.had.co.nz/introduction.html
Admin
Introducing R
R for data science
Course objective
Getting started