CS计算机代考程序代写 data structure data science database Lecture 1: Introduction to Base R – STAT GU4206/GR5206 Statistical Computing & Introduction to Data Science

Lecture 1: Introduction to Base R – STAT GU4206/GR5206 Statistical Computing & Introduction to Data Science

Lecture 1: Introduction to Base R
STAT GU4206/GR5206 Statistical Computing & Introduction to Data

Science

Gabriel Young
Columbia University

September 10, 2021

Gabriel Young Lecture 1: Intro to R September 10, 2021 1 / 128

Some notes on the course

Prerequisite

You should have taken or be taking the following courses:

• GR5204 Statistical Inference (or GU4204)

• GR5205 Linear Regression Models (or GU4205)

• Ideally do not take both UN2102 and GU4206.

Warning

If you haven’t taken (aren’t taking) these courses, you’ll have to do extra
work at times to catch up.

Some other prerequisites

This course assumes basic knowledge of Linear Algebra, Multivariate
Calculus, and Probability. We will review these topics in class but can’t

cover everything! If you haven’t taken a Linear Algebra course, for
example, expect to spend some extra time catching up.

Gabriel Young Lecture 1: Intro to R September 10, 2021 2 / 128

Honor Code

”Columbia’s intellectual community relies on academic integrity and
responsibility as the cornerstone of its work. Graduate students are

expected to exhibit the highest level of personal and academic honesty as
they engage in scholarly discourse and research. In practical terms, you
must be responsible for the full and accurate attribution of the

ideas of others in all of your research papers and projects; you must
be honest when taking your examinations; you must always submit
your own work and not that of another student, scholar, or internet

source. Graduate students are responsible for knowing and correctly
utilizing referencing and bibliographical guidelines.”

• Resources at http://gsas.columbia.edu/academic-integrity.

• Failure to observe these rules of conduct will have serious academic
consequences, up to and including dismissal from the university.

Gabriel Young Lecture 1: Intro to R September 10, 2021 3 / 128

http://gsas.columbia.edu/academic-integrity

A word of thanks

This course was developed for Columbia students with much guidance from
the following course. Its web page is also a good resource for students.

Cosma Shalizi and Andrew Thomas (2014), “Statistical Computing
36-350: Beginning to Advanced Techniques in R”.

A special thanks for guidance and advice from the following individuals
and many others. John Cunningham, Yang Feng, Tian Zheng, Jennifer
Hoeting, Cynthia Rush, Linxi Liu.

Gabriel Young Lecture 1: Intro to R September 10, 2021 4 / 128

http://www.stat.cmu.edu/~cshalizi/statcomp/14/
http://www.stat.cmu.edu/~cshalizi/statcomp/14/
http://stat.columbia.edu/~cunningham/
http://www.stat.columbia.edu/~yangfeng/
http://www.stat.columbia.edu/~tzheng/
http://www.columbia.edu/~cgr2130/

Department Directory

Statistical computing

It’s essential for modern statisticians to be fluent in statistical computing
(statistical programming).

At the end of this course, you will have:

• The ability to read and write code for statistical data analysis.

• An understanding of programming topics such as functions, objects,
data structures, debugging.

• Apply the R programming topics to common statistical tasks, i.e.,
graphics, regression, testing,…

Gabriel Young Lecture 1: Intro to R September 10, 2021 5 / 128

Statistical computing & data science

What’s the difference between data science and statistics?

“A data scientist is just a sexier word for statistician.”
Nate Silver (outdated)

“A data scientist is a better computer scientist than a statistician and is a
better statistician than a computer scientist.”

Unknown (still accurate)

What does a data scientist do?
• There is not one correct answer.

• Transform data into valuable information!

• A data scientist spends a significant portion of time processing data
and less time modeling data.

Gabriel Young Lecture 1: Intro to R September 10, 2021 6 / 128

Working with data in R 1

Steps

1. Import data from various sources: the web, a database, a stored file.

2. Clean and format the data. Usually this means rows are observations
and columns are variables.

3. Analyze the data using visualizations, modelling, or other methods.

4. Communicate your results.

In this class, we learn tools and strategies for completing each step.

1Slide developed from G. Grolemund and H. Wickham.
Gabriel Young Lecture 1: Intro to R September 10, 2021 7 / 128

Functional programming 2

Function Definition

A function is a machine which turns input objects (arguments) into an
output object (return value), according to a definite rule.

• Programming is writing functions to transform inputs into outputs
easily and correctly.

• Good programming takes big transformations and breaks them down
into smaller and smaller ones until you come to tasks which the
built-in functions can do.

2Slide developed from C.R. Shalizi and A.C. Thomas (2014).
Gabriel Young Lecture 1: Intro to R September 10, 2021 8 / 128

Section I

What is R?

Gabriel Young Lecture 1: Intro to R September 10, 2021 9 / 128

What is R?

R is an open-source statistical programming software used by industry
professionals and academics alike.

This means that R is supported by a community of users.

Will use R extensively in this class

• Download R at: https://www.r-project.org

• Download RStudio at: https://www.rstudio.com

You must have R downloaded by next lecture!

Gabriel Young Lecture 1: Intro to R September 10, 2021 10 / 128

https://www.r-project.org
https://www.rstudio.com

Using R and RStudio

• The editor allows you to type and save code that you may want to
reuse later.

• Basic interaction with R happens in the console. The is where you
type R code.

Figure 1: Image of RStudio from G. Grolemund and H. Wickham

Gabriel Young Lecture 1: Intro to R September 10, 2021 11 / 128

Using R and RStudio

If you’d like more information on the other functions and features of
RStudio check out the short video tutorial on the Canvas page.

Figure 2: Image of RStudio from G. Grolemund and H. Wickham

Gabriel Young Lecture 1: Intro to R September 10, 2021 12 / 128

Using R and RStudio

Code example.

Gabriel Young Lecture 1: Intro to R September 10, 2021 13 / 128

R Markdown

For your homeworks and lab reports you will be using R Markdown which
allows you to put your code, its outputs, and your thoughts all in one
document.

Gabriel Young Lecture 1: Intro to R September 10, 2021 14 / 128

Steps to create an R Markdown document

To create a new R Markdown document open RStudio and go to File ->
New File -> R Markdown.

Figure 3: Image from https://www.r-bloggers.com

Gabriel Young Lecture 1: Intro to R September 10, 2021 15 / 128

https://www.r-bloggers.com

Steps to create an R Markdown document

Enter the title of your document and the author and hit OK.

Figure 4: Image from https://www.r-bloggers.com

Gabriel Young Lecture 1: Intro to R September 10, 2021 16 / 128

https://www.r-bloggers.com

Steps to create an R Markdown document

Clicking the Knit HTML button in the editor window generates the
document.

Editing the Markdown Document

• Writing equations uses LaTex code. So, for example, $x^2$ produces
x2.

• Insert R code directly into the document using the following format:

“`{r}
x <- rnorm(100) ``` • If you need help, go to Help -> Markdown Quick Reference.

• You’ll get practice with this in lab.

Gabriel Young Lecture 1: Intro to R September 10, 2021 17 / 128

Steps to Create an R Markdown document

Code example.

Gabriel Young Lecture 1: Intro to R September 10, 2021 18 / 128

A quick example…

Type the following into your console:

> # Create a vector in R named x

> x <- c(5, 29, 13, 87) > x

[1] 5 29 13 87

Two important ideas:

1. Commenting

2. Assignment

Gabriel Young Lecture 1: Intro to R September 10, 2021 19 / 128

A quick example…

Type the following into your console:

> # Create a vector in R

> x <- c(5, 29, 13, 87) > x

[1] 5 29 13 87

Two important ideas:

1. Commenting
• Anything after the # isn’t evaluated by R.
• Used to leave notes for humans reading your code.
• Very important in our class. Comment your code!

2. Assignment

Gabriel Young Lecture 1: Intro to R September 10, 2021 20 / 128

A quick example…

Type the following into your console:

> # Create a vector in R

> x <- c(5, 29, 13, 87) > x

[1] 5 29 13 87

Two important ideas:

1. Commenting

2. Assignment
• The <- symbol means assign x the value c(5, 29, 13, 87). • Could use = instead of <- but this is discouraged. • All assignments take the same form: object_name <- value. • c() means “concatenate”. • Type x into the console to print its assignment. Gabriel Young Lecture 1: Intro to R September 10, 2021 21 / 128 A quick example... Type the following into your console: > # Create a vector in R names “x”

> x <- c(5, 29, 13, 87) > x

[1] 5 29 13 87

Note

The [1] tells us that 5 is the first element of the vector.

> # Create a vector in R names “x”

> x <- 1:50 > x

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Gabriel Young Lecture 1: Intro to R September 10, 2021 22 / 128

Section II

Base R versus Tidyverse

Gabriel Young Lecture 1: Intro to R September 10, 2021 23 / 128

Base R

What is Base R?

”The package named base is in a way the core of R and contains the basic
functions of the language, particularly, for reading and manipulating data.”

R for Beginners, Emmanuel Paradis

Base R utility

Base R includes all default code for performing common data
manipulation and statistical tasks.

You might recognize some Base R functions:

• mean(), median(), lm(), summary(), sort()

• data.frame(), read.csv(), cbind(), grep(), regexpr()

• Many many more…

If you don’t recognize any Base R functions, don’t worry!

Gabriel Young Lecture 1: Intro to R September 10, 2021 24 / 128

Base R

Base R package

• Base R is technically a package.

• What is a R package?

• We will see shortly…

Common criticisms of Base R
• The code doesn’t flow as well as other languages.

• Function names and arguments are often inconsistent and potentially
confusing.

• Base R functions sometimes don’t return type-stable objects.

• Base R functions are not refined to run as fast as possible.

• Other complaints exist…

Code examples will be introduced eventually.

Gabriel Young Lecture 1: Intro to R September 10, 2021 25 / 128

The tidyverse

The tidyverse solution to Base R

• The tidyverse collection of packages often perform the same tasks
as Base R.

• The package magrittr allows for code to be written as a pipe, which
helps with the flow of the code.

• The tidyverse collection of packages has more descriptive function
names and consistent inputs.

• The tidyverse collection of packages are type-stable (whatever that
means).

• The tidyverse collection of packages are often faster than common
Base R functions.

• Other solutions exist…

Code examples and comparisons will be introduced eventually.

Gabriel Young Lecture 1: Intro to R September 10, 2021 26 / 128

The tidyverse

Primary tidyverse packages

• dplyr: data manipulation

• ggplot2: creating advanced graphics

• readdr: reading in rectangular data.

• tibble: A tibble, or tbl_df, is a modern reimagining of the
data.frame.

• tidyr: creating tidy data.

• purrr: enhancing R’s functional programming (FP).

The above statement was taken directly from the tidyverse website:
https://www.tidyverse.org

No need to worry about this slide.. yet!

tidyverse code examples will be introduced eventually.

Gabriel Young Lecture 1: Intro to R September 10, 2021 27 / 128

Base R versus tidyverse

Why ever use Base R?

1. It gets the job done!

2. To become an expert R programmer, you have to know Base R.

3. Some Base R functions are very common and useful, e.g., mean()

What should you learn first? Base R or Tidyverse?

1. Some experts believe you should learn Base R first.

2. Some experts believe you should learn tidyverse first.

3. Lately, more people are shifting to tidyverse.

What should you learn first? Base R or Tidyverse?

In this class, we will start with Base R and eventually learn tidyverse in
parallel.

Gabriel Young Lecture 1: Intro to R September 10, 2021 28 / 128

Section III

Base R

Warm-Up, Remove Variables, Shortcuts

Gabriel Young Lecture 1: Intro to R September 10, 2021 29 / 128

Another quick example… warm up

Type the following into your console:

> # Create a vector in R named x

> x <- rnorm(100,mean=10,sd=3) > length(x)

[1] 100

> head(x,20)

[1] 4.242711 6.869580 4.838829 10.335513 1.344058

[6] 9.164582 7.481033 13.033107 10.514919 9.239825

[11] 11.190750 7.129802 9.332573 12.373758 13.959686

[16] 11.502805 15.500928 9.528124 9.404475 9.717127

> hist(x)

Describe what the above code is doing.

Gabriel Young Lecture 1: Intro to R September 10, 2021 30 / 128

Another quick example… warm up

Histogram of x

x

F
re

q
u

e
n

cy

0 5 10 15

0
5

1
0

1
5

2
0

2
5

Figure 5: Histogram of x

Gabriel Young Lecture 1: Intro to R September 10, 2021 31 / 128

Remove a variable from your environment

The rm() function
• Fist assign a vector to z.

> z <- 1:10 > z

[1] 1 2 3 4 5 6 7 8 9 10

• Manually look at your environment.

• Use the function rm() to remove z.

> rm(z)

• Check if z exists
> # uncomment the code below

> #z

• rm(list=ls()) is a crude way to clear the entire global environment.

Gabriel Young Lecture 1: Intro to R September 10, 2021 32 / 128

A few shortcuts

Shortcuts on a mac (sorry PC users)

• Run current line/selection: “command”+”enter”

• Assignment arrow: “option”+”-”

• Clear console: “control”+”l”

• Restart R Session: ”apple”+”F10” or ”command”+”shift”+”F10”

• Note: to clear the global R environment, the above shortcut is
recommended over using rm(list=ls()).

• Look up common shortcuts: “alt”+”k” or ”option”+”shift”+”F10”

• More…

For some shortcuts on Mac and PC:
• Click here

Gabriel Young Lecture 1: Intro to R September 10, 2021 33 / 128

https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts

Section IV

Base R

Variable Types, Vectors, & Matrices

Gabriel Young Lecture 1: Intro to R September 10, 2021 34 / 128

Variable types

R has a variety of variable types (or modes).

Modes

1. Numeric or double (3.7, 15907, 80.333)

2. Integer (1L,2L,3L)

3. Character (”Columbia”, ”Statistics is fun!”, ”HELLO WORLD”)

4. Logical (TRUE, FALSE, 1, 0)

5. Complex (1 + 2i)

• Numeric, integer, character and logical are atomic.

• In this class, we are primarily concerned with numeric, character, and
logical.

Gabriel Young Lecture 1: Intro to R September 10, 2021 35 / 128

Let’s check this out in R

‘Numeric’ variable type

> x <- 2 > mode(x)

[1] “numeric”

> typeof(x)

[1] “double”

> y <- as.integer(3) > typeof(y)

[1] “integer”

Gabriel Young Lecture 1: Intro to R September 10, 2021 36 / 128

Let’s check this out in R

‘Complex’ variable type

> z <- 1 - 2i > z

[1] 1-2i

> typeof(z)

[1] “complex”

Gabriel Young Lecture 1: Intro to R September 10, 2021 37 / 128

Let’s check this out in R

‘Character’ variable type

> name <- "Columbia University" > name

[1] “Columbia University”

> typeof(name)

[1] “character”

Gabriel Young Lecture 1: Intro to R September 10, 2021 38 / 128

Let’s check this out in R

‘Logical’ variable type

> a <- TRUE > b <- F > a

[1] TRUE

> b

[1] FALSE

> typeof(a)

[1] “logical”

Gabriel Young Lecture 1: Intro to R September 10, 2021 39 / 128

Data types

There are many data types in R.

Data Types

• Vectors

• Scalars

• Matrices

• Arrays

• Lists

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

Vectors

• All elements must be the same type (mode).

• Scalars

• Matrices

• Arrays

• Lists

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

• Vectors

Scalars
• Treated as one-element vectors in R.

• Matrices

• Arrays

• Lists

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

• Vectors

• Scalars

Matrices

• An array (rows and columns) of values.

• All values must be the same type (mode).

• Arrays

• Lists

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

• Vectors

• Scalars

• Matrices

Arrays

• Similar to matrices, but with more than two dimensions.

• Lists

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

• Vectors

• Scalars

• Matrices

• Arrays

Lists
• Like a vector, but elements can be of different modes.

• Dataframes

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Data types

There are many data types in R.

Data Types

• Vectors

• Scalars

• Matrices

• Arrays

• Lists

Dataframes
• Like a matrix, but elements can be of different modes.

Gabriel Young Lecture 1: Intro to R September 10, 2021 40 / 128

Check your understanding

What mode are the following variables?

1. 3*TRUE?

2. “147”?

Solutions

> 3*TRUE # Logicals in arithmetic

[1] 3

> mode(3*TRUE)

[1] “numeric”

> mode(“147”)

[1] “character”

Gabriel Young Lecture 1: Intro to R September 10, 2021 41 / 128

Check your understanding

What mode are the following variables?

1. 3*TRUE?

2. “147”?

Solutions

> 3*TRUE # Logicals in arithmetic

[1] 3

> mode(3*TRUE)

[1] “numeric”

> mode(“147”)

[1] “character”

Gabriel Young Lecture 1: Intro to R September 10, 2021 41 / 128

Vectors and matrices in R

Recall: Vectors
• Variable types are called modes.

• All elements of a vector are the same mode.

• Scalars are just single-element vectors.

Recall: Matrices
• All elements of a matrix are the same mode.

• A matrix is treated like a vector in R with two additional attributes:
number of rows and number of columns.

Gabriel Young Lecture 1: Intro to R September 10, 2021 42 / 128

Building vectors in R

• Use the concatenate function c() to define a vector.

Some Examples

Defining a numeric vector:

> x <- c(2, pi, 1/2, 3^2) > x

[1] 2.000000 3.141593 0.500000 9.000000

A character vector:

> y <- c("NYC", "Boston", "Philadelphia") > y

[1] “NYC” “Boston” “Philadelphia”

Gabriel Young Lecture 1: Intro to R September 10, 2021 43 / 128

Building vectors in R

• The syntax a:b produces a sequence of integers ranging from a to b.
• The repetition function rep(val, num) repeats the value val a total

of num times.

Some Examples

A sequential list of integers:

> z <- 5:10 > z

[1] 5 6 7 8 9 10

Using rep() to create a 1’s vector:

> u <- rep(1, 18) > u

[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Gabriel Young Lecture 1: Intro to R September 10, 2021 44 / 128

Building vectors in R

• Alternately, could allocate space and then fill in element-wise.

Some Examples

> v <- c() > v[1] <- TRUE > v[2] <- TRUE > v[3] <- FALSE > v

[1] TRUE TRUE FALSE

Gabriel Young Lecture 1: Intro to R September 10, 2021 45 / 128

Building vectors in R

• The concatenate function c() can be nested.

Some Examples

> vec1 <- rep(-27, 3) > vec1

[1] -27 -27 -27

> vec2 <- c(vec1, c(-26, -25, -24)) > vec2

[1] -27 -27 -27 -26 -25 -24

Gabriel Young Lecture 1: Intro to R September 10, 2021 46 / 128

Building matrices in R

• Use the function matrix(values, nrow, ncol) to define your
matrix.

• In R, matrices are stored in column-major order (determines where the
number go as in the following example).

Some Examples

Building a matrix that fills in by column:

> mat <- matrix(1:9, nrow = 3, ncol = 3) > mat

[,1] [,2] [,3]

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

Gabriel Young Lecture 1: Intro to R September 10, 2021 47 / 128

Building matrices in R

• Use the function matrix(values, nrow, ncol) to define your
matrix.

• In R, matrices are stored in column-major order (determines where the
number go as in the following example).

Some Examples

Building a matrix that fills in by row:

> new_mat <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE) > new_mat

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

Gabriel Young Lecture 1: Intro to R September 10, 2021 48 / 128

Building matrices in R

• Alternately, could allocate space and fill in element-wise.

• Tell R the size of the matrix beforehand.

Some Examples

Allocating the space for a matrix then filling it in:

> this_mat <- matrix(nrow = 2, ncol = 2) > this_mat[1,1] <- sqrt(27) > this_mat[1,2] <- round(sqrt(27), 3) > this_mat[2,1] <- exp(1) > this_mat[2,2] <- log(1) > this_mat

[,1] [,2]

[1,] 5.196152 5.196

[2,] 2.718282 0.000

Gabriel Young Lecture 1: Intro to R September 10, 2021 49 / 128

Building matrices in R

• The row bind function rbind() also works, though it can be costly
computationally. Similarly for column bind function cbind().

Some Examples

> vec1 <- rep(0, 4) > vec2 <- c("We're", "making", "matrices", "!") > final_mat <- rbind(vec1, vec2) > final_mat

[,1] [,2] [,3] [,4]

vec1 “0” “0” “0” “0”

vec2 “We’re” “making” “matrices” “!”

Recall, matrix entries must all be the same type.

Gabriel Young Lecture 1: Intro to R September 10, 2021 50 / 128

Building matrices in R

• Name columns (rows) of a matrix using colnames() (rownames()).

Some Examples

> this_mat # Defined previously

[,1] [,2]

[1,] 5.196152 5.196

[2,] 2.718282 0.000

> colnames(this_mat) # Nothing there yet

NULL

Gabriel Young Lecture 1: Intro to R September 10, 2021 51 / 128

Building matrices in R

• Name columns (rows) of a matrix using colnames() (rownames()).

Some Examples

> colnames(this_mat) <- c("Column1", "Column2") > this_mat

Column1 Column2

[1,] 5.196152 5.196

[2,] 2.718282 0.000

Gabriel Young Lecture 1: Intro to R September 10, 2021 52 / 128

Mixing variable modes

• When variable modes are mixed in vectors or matrices, R picks the
‘least common denominator’.

• Use the structure function str() to display the internal structure of
an R object.

Example

> vec <- c(1.75, TRUE, "abc") > vec

[1] “1.75” “TRUE” “abc”

> str(vec)

chr [1:3] “1.75” “TRUE” “abc”

Gabriel Young Lecture 1: Intro to R September 10, 2021 53 / 128

Help in R

• Use a single question mark ? to get help about a specific function
using form ?function name.

• Provides a description, lists the arguments (to the function), gives an
example, etc.

• Use the double question mark to get help with a topic using form
??topic.

How to get help in R

> # What does the str() function do?

>

> # Function help:

> ?str

> # Fuzzy matching:

> ??”structure”

Gabriel Young Lecture 1: Intro to R September 10, 2021 54 / 128

Help in R

Code example.

Gabriel Young Lecture 1: Intro to R September 10, 2021 55 / 128

Subsetting vectors

• Use square brackets [] to extract elements or subsets of elements.

Example

> y <- c(27, -34, 19, 7, 61) > y[2]

[1] -34

> y[3:5]

[1] 19 7 61

> y[c(1, 4)]

[1] 27 7

Gabriel Young Lecture 1: Intro to R September 10, 2021 56 / 128

Subsetting vectors

• Use the same strategy to reassign elements of a vector.

Example

> y <- c(27, -34, 19, 7, 61) > y

[1] 27 -34 19 7 61

> y[c(1, 4)] <- 0 > y

[1] 0 -34 19 0 61

Gabriel Young Lecture 1: Intro to R September 10, 2021 57 / 128

Subsetting vectors

• Negative values can be used to exclude elements.

Example

> y <- c(27, -34, 19, 7, 61) > y

[1] 27 -34 19 7 61

> y[-c(1, 4)]

[1] -34 19 61

> y <- y[-1] > y

[1] -34 19 7 61

Gabriel Young Lecture 1: Intro to R September 10, 2021 58 / 128

Subsetting matrices

• mat[i,j] returns the (i , j)th element of mat.
• mat[i, ] returns the i th row of mat.
• mat[ ,j] returns the j th column of mat.

> mat <- matrix(1:8, ncol = 4) > mat

[,1] [,2] [,3] [,4]

[1,] 1 3 5 7

[2,] 2 4 6 8

> mat[, 2:3]

[,1] [,2]

[1,] 3 5

[2,] 4 6

Gabriel Young Lecture 1: Intro to R September 10, 2021 59 / 128

Subsetting matrices

• Can use column names or row names to subset as well.
• Negative values are used to exclude elements.

> this_mat

Column1 Column2

[1,] 5.196152 5.196

[2,] 2.718282 0.000

> this_mat[, “Column2”]

[1] 5.196 0.000

> this_mat[, -1]

[1] 5.196 0.000

Gabriel Young Lecture 1: Intro to R September 10, 2021 60 / 128

Section V

An Extended Example:
Image Data

Gabriel Young Lecture 1: Intro to R September 10, 2021 61 / 128

But first… packages!

What are packages?

• Packages are collections of functions, data, or code that extend the
capabilities of base R.

• Some packages come pre-loaded but others must be downloaded and
installed using function install.packages(“package name”).

• An installed R package must be loaded in each session it is to be used
using function library(“package name”).

> # Installing the “pixmap” package.

> install.packages(“pixmap”)

> library(“pixmap”)

Gabriel Young Lecture 1: Intro to R September 10, 2021 62 / 128

Image data example 3

• Images are made up of pixels which are arranged in rows and columns
(like a matrix).

• Image data are matrices where each element is a number representing
the intensity or brightness of the corresponding pixel.

• We will work with a greyscale image with numbers ranging from 0
(black) to 1 (white).

3Example developed from N. Matloff, “The Art of R Programming: A Tour of
Statistical Software Design”.

Gabriel Young Lecture 1: Intro to R September 10, 2021 63 / 128

Image data example (cont.)

> library(pixmap)

> casablanca_pic <- read.pnm("casablanca.pgm") > casablanca_pic

Pixmap image

Type : pixmapGrey

Size : 360×460

Resolution : 1×1

Bounding box : 0 0 460 360

> plot(casablanca_pic)

Gabriel Young Lecture 1: Intro to R September 10, 2021 64 / 128

Image data example (cont.)

Figure 6: Still Image from Casablanca

Gabriel Young Lecture 1: Intro to R September 10, 2021 65 / 128

Image data example (cont.)

> dim(casablanca_pic@grey)

[1] 360 460

> casablanca_pic@grey[360, 100]

[1] 0.4431373

> casablanca_pic@grey[180, 10]

[1] 0.9882353

Gabriel Young Lecture 1: Intro to R September 10, 2021 66 / 128

Image data example (cont.)

Let’s erase Rick from the image.

> casablanca_pic@grey[15:70, 220:265] <- 1 > plot(casablanca_pic)

Gabriel Young Lecture 1: Intro to R September 10, 2021 67 / 128

Image data example (cont.)

Figure 7: Still Image from Casablanca

Gabriel Young Lecture 1: Intro to R September 10, 2021 68 / 128

Image data example (cont.)

• Use R’s locator() function to find the rows and columns
corresponding to Rick’s face.

• A call to the function allows the user to click on a point in a plot and
then the function returns the coordinates of the click.

Gabriel Young Lecture 1: Intro to R September 10, 2021 69 / 128

Check your understanding

Using matrix z, what is the output of the following?

> z

First Second Third

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

1. z[2:3, “Third”]?

2. c(z[,-(2:3)], “abc”)?

3. rbind(z[1,], 1:3)?

Gabriel Young Lecture 1: Intro to R September 10, 2021 70 / 128

Check your understanding

> z

First Second Third

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

Solutions

> z[2:3, “Third”]

[1] 8 9

> c(z[,-(2:3)], “abc”)

[1] “1” “2” “3” “abc”

Gabriel Young Lecture 1: Intro to R September 10, 2021 71 / 128

Check your understanding

> z

First Second Third

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

Solutions

> rbind(z[1,], 1:3)

First Second Third

[1,] 1 4 7

[2,] 1 2 3

Gabriel Young Lecture 1: Intro to R September 10, 2021 72 / 128

Section VI

More with Vectors & Matrices
and Linear Algebra Review

Gabriel Young Lecture 1: Intro to R September 10, 2021 73 / 128

Reminder: vector algebra

Define vectors:

A = (a1, a2, . . . , aN), and B = (b1, b2, . . . , bN).

Then for c a scalar,

• A + B = (a1 + b1, a2 + b2, . . . , aN + bn).

• cA = (ca1, ca2, . . . , caN).

• Dot product: A · B = a1b1 + a2b2 + . . .+ aNbN .
• Norm: ‖A‖2 = A · A = a12 + a22 + . . .+ aN2.

Gabriel Young Lecture 1: Intro to R September 10, 2021 74 / 128

Reminder: matrix algebra

Define matrices:

A =

(
a1 a3
a2 a4

)
, and B =

(
b1 b3
b2 b4

)
.

Then for c a scalar,

• A + B =
(

a1 + b1 a3 + b3
a2 + b2 a4 + b4

)
.

• cA =
(

ca1 ca3
ca2 ca4

)
.

• Matrix Multiplication: AB =
(

a1b1 + a3b2 a1b3 + a3b4
a2b1 + a4b2 a2b3 + a4b4

)
.

What if the dimensions of A and B are different?

Gabriel Young Lecture 1: Intro to R September 10, 2021 75 / 128

Reminder: matrix operations

Define matrices:

A =




a1,1 a1,2 · · · a1,m
a2,1 a2,2 · · · a2,m


. . .

an,1 an,2 · · · an,m


 , and B =

(
b1,1 b1,2
b2,1 b2,2

)
.

• The transpose of A is a m × n matrix:

t(A) =




a1,1 a2,2 · · · an,1
a1,2 a2,2 · · · an,2


. . .

a1,m a2,m · · · an,m


 .

• The trace of the square matrix B is the sum of the diagonal elements:
tr(B) = b1,1 + b2,2.

Gabriel Young Lecture 1: Intro to R September 10, 2021 76 / 128

Reminder: matrix operations

Define matrices:

A =




a1,1 a1,2 · · · a1,m
a2,1 a2,2 · · · a2,m


. . .

an,1 an,2 · · · an,m


 , and B =

(
b1,1 b1,2
b2,1 b2,2

)
.

• The determinant of square matrix B is det(B) = b1,1b2,2 − b1,2b2,1.
How do you find the determinant for an n × n matrix?

• The inverse of square matrix B is denoted B−1 and

BB−1 =

(
1 0
0 1

)
and

B−1 =
1

det(B)

(
b2,2 −b1,2
−b2,1 b1,1

)
.

How do you find the inverse of an n × n matrix?
Gabriel Young Lecture 1: Intro to R September 10, 2021 77 / 128

Functions on numeric vectors

Useful R Functions

R function Description

length(x) Length of a vector x
sum(x) Sum of a vector x
mean(x) Arithmetic mean of a vector x

quantiles(x) Sample quantiles of a vector x
max(x) Maximum of a vector x
min(x) Minimum of a vector x
sd(x) Sample standard deviation of a vector x
var(x) Sample variance of a vector x

summary(x) Summary statistics of vector x

Reminder…

To access the help documentation of a known R function, use syntax
?function.

Gabriel Young Lecture 1: Intro to R September 10, 2021 78 / 128

Example: Functions on numeric vectors

Example

To investigate the dependence of energy expenditure (y) on body build,
researches used underwater weighing techniques to determine the fat-free body
mass (x) for each of seven men. They also measured the total 24-hour energy
expenditure for each man during conditions of quiet sedentary activity. The
results are shown in the table.

Subject 1 2 3 4 5 6 7
x 49.3 59.3 68.3 48.1 57.61 78.1 76.1
y 1,894 2,050 2,353 1,838 1,948 2,528 2,568

> # Define covariate and response variable

> x <- c(49.3,59.3,68.3,48.1,57.61,78.1,76.1) > y <- c(1894,2050,2353,1838,1948,2528,2568) Gabriel Young Lecture 1: Intro to R September 10, 2021 79 / 128 Example: functions on numeric vectors (continued) Example Subject 1 2 3 4 5 6 7 x 49.3 59.3 68.3 48.1 57.61 78.1 76.1 y 1,894 2,050 2,353 1,838 1,948 2,528 2,568 > n <- length(x) # Sample size > n

[1] 7

> max(x)

[1] 78.1

> sd(x)

[1] 12.09438

Gabriel Young Lecture 1: Intro to R September 10, 2021 80 / 128

Example: functions on numeric vectors (continued)

Example

Subject 1 2 3 4 5 6 7
x 49.3 59.3 68.3 48.1 57.61 78.1 76.1
y 1,894 2,050 2,353 1,838 1,948 2,528 2,568

> summary(x) # Summary statistics

Min. 1st Qu. Median Mean 3rd Qu. Max.

48.10 53.45 59.30 62.40 72.20 78.10

> summary(y)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1838 1921 2050 2168 2440 2568

Gabriel Young Lecture 1: Intro to R September 10, 2021 81 / 128

Element-wise operations for vectors

Vectors x and y must have the same length. Let a be a scalar.

Element-Wise Operators

Operator Description

a + x Element-wise scalar addition
a * x Element-wise scalar multiplication
x + y Element-wise addition
x * y Element-wise multiplication
x ˆ a Element-wise power
a ˆ x Element-wise exponentiation
x ˆ y Element-wise exponentiation

Recycling

Recall that a scalar is just a vector of length 1. When a shorter vector is
added to a longer one, the elements in the shorter vectored are repeated.
This is recycling.

Gabriel Young Lecture 1: Intro to R September 10, 2021 82 / 128

Some examples

> u <- c(1,3,5) > v <- c(1,3,5) > v + 4 # Recycling

[1] 5 7 9

> v + c(1,3) # Recycling

[1] 2 6 6

> v + u

[1] 2 6 10

Gabriel Young Lecture 1: Intro to R September 10, 2021 83 / 128

Some examples

Note: Operators are functions in R.

> u <- c(1,3,5) > v <- c(1,3,5) > ‘+'(v,u)

[1] 2 6 10

> ‘*'(v,u)

[1] 1 9 25

Gabriel Young Lecture 1: Intro to R September 10, 2021 84 / 128

Line of best fit example

Recall the energy expenditure versus fat-free body mass example.

Example

To investigate the dependence of energy expenditure (y) on body build,
researches used underwater weighing techniques to determine the fat-free body
mass (x) for each of seven men. They also measured the total 24-hour energy
expenditure for each man during conditions of quiet sedentary activity. The
results are shown in the table.

Subject 1 2 3 4 5 6 7
x 49.3 59.3 68.3 48.1 57.61 78.1 76.1
y 1,894 2,050 2,353 1,838 1,948 2,528 2,568

Gabriel Young Lecture 1: Intro to R September 10, 2021 85 / 128

Line of best fit example (cont.)

Let’s find the line of best fit.

> plot(x,y, xlab = “Body Mass”, ylab = “Energy Expenditure”)

50 55 60 65 70 75

2
0

0
0

2
2

0
0

2
4

0
0

Body Mass

E
n

e
rg

y
E

xp
e

n
d

itu
re

Figure 8: Scatterplot of Energy Expenditure on Body Mass
Gabriel Young Lecture 1: Intro to R September 10, 2021 86 / 128

Line of best fit example (cont.)

Recall:

For the line of best fit, ŷ = β̂0 + β̂1x where

β̂1 =
Sxy
Sxx

=


(xi − x̄)(yi − ȳ)∑

(xi − x̄)2
, and β̂0 = ȳ − β̂1x̄ .

Solution:

> # First, compute x and y deviations

> dev_x <- x - mean(x) > dev_y <- y - mean(y) > # Next, compute sum of squares of xy and xx

> Sxy <- sum(dev_x * dev_y) > Sxx <- sum(dev_x * dev_x) Gabriel Young Lecture 1: Intro to R September 10, 2021 87 / 128 Line of best fit example (cont.) Recall: For the line of best fit, ŷ = β̂0 + β̂1x where β̂1 = Sxy Sxx = ∑ (xi − x̄)(yi − ȳ)∑ (xi − x̄)2 , and β̂0 = ȳ − β̂1x̄ . Solution: > # Compute the estimated slope

> Sxy/Sxx

[1] 25.01184

> # Compute the estimated intercept

> mean(y) – (Sxy/Sxx) * mean(x)

[1] 607.6539

Gabriel Young Lecture 1: Intro to R September 10, 2021 88 / 128

Functions for numeric matrices

Useful R Functions

R Function Description

A %*% B Matrix multiplication for compatible matrices A,B.
dim(A) Dimension of matrix A.
t(A) Transpose of matrix A.

diag(x) Returns a diagonal matrix with elements x along the diagonal.
diag(A) Returns a vector of the diagonal elements of A.

solve(A,b) Returns x in the equation b = Ax .
solve(A) Inverse of A where A is a square matrix.
cbind(A,B) Combine matrices horizontally for compatible matrices A,B.
rbind(A,B) Combine matrices vertically for compatible matrices A,B.

Gabriel Young Lecture 1: Intro to R September 10, 2021 89 / 128

System of linear equations example

Solve the system of equations:




3x − 2y + z = −1
x + 1

2
y − 12z = 2

x + y + 3z = 3

Recall,

We can represent the system using matrices as follows:
 3 −2 11 1

2
−12

1 1 3




 xy

z


 =


 −12

3


 .

Then we would like to solve for vector (x , y , z).

Gabriel Young Lecture 1: Intro to R September 10, 2021 90 / 128

System of linear equations example (cont.)

Recall, 
 3 −2 11 1

2
−12

1 1 3




 xy

z


 =


 −12

3


Solution:

> # Define matrix A

> A <- matrix(c(3,1,1,-2,1/2,1,1,-12,3), nrow = 3) > # Define vector b

> b <- c(-1, 2, 3) > # Use the solve function

> solve(A, b)

[1] 1 2 0

Gabriel Young Lecture 1: Intro to R September 10, 2021 91 / 128

System of linear equations example (cont.)

Recall, 
 3 −2 11 1

2
−12

1 1 1




 xy

z


 =


 −12

3


Let’s use matrix multiplication to check that x =
(
1 2 0

)T
is the

correct solution to our system of equations.

Solution

> x <- c(1, 2, 0) # Define solution vector x > A %*% x # Then check with matrix multiplication

[,1]

[1,] -1

[2,] 2

[3,] 3

Gabriel Young Lecture 1: Intro to R September 10, 2021 92 / 128

Element-wise operations for matrices

Let A and B be matrices of the same dimensions. Let a be a scalar.

Element-wise Operators

Operator Description

a + A Element-wise scalar addition
a * A Element-wise scalar multiplication
A + B Element-wise addition
A * B Element-wise multiplication
A ˆ a Element-wise power
a ˆ A Element-wise exponentiation
A ˆ B Element-wise exponentiation

Gabriel Young Lecture 1: Intro to R September 10, 2021 93 / 128

Section VII

Base R

Filtering

Gabriel Young Lecture 1: Intro to R September 10, 2021 94 / 128

Logical and relational operators

Logical Operator Description

! Negation (NOT)
& AND
| OR

Relational Operator Description

<, > Less than, greater than
<=, >= Less than or equal to, greater than or equal to
== Equal to
!= Not equal to

Gabriel Young Lecture 1: Intro to R September 10, 2021 95 / 128

Examples of logical and relational operators

Some Basic Examples

> 1 > 3

[1] FALSE

> 1 == 3

[1] FALSE

> 1 != 3

[1] TRUE

Gabriel Young Lecture 1: Intro to R September 10, 2021 96 / 128

Examples of logical and relational operators

Some Basic Examples

> (1 > 3) & (4*5 == 20)

[1] FALSE

> (1 > 3) | (4*5 == 20)

[1] TRUE

Gabriel Young Lecture 1: Intro to R September 10, 2021 97 / 128

Examples of logical and relational operators

Some Basic Examples

> c(0,1,4) < 3 [1] TRUE TRUE FALSE > which(c(0,1,4) < 3) [1] 1 2 > which(c(TRUE, TRUE, FALSE))

[1] 1 2

Gabriel Young Lecture 1: Intro to R September 10, 2021 98 / 128

Examples of Logical and Relational operators

Some Basic Examples

> c(0,1,4) >= c(1,1,3)

[1] FALSE TRUE TRUE

> c(“Cat”,”Dog”) == “Dog”

[1] FALSE TRUE

Gabriel Young Lecture 1: Intro to R September 10, 2021 99 / 128

Filtering examples

Sometimes we would like to extract elements form a vector or matrix that
satisfy certain criteria.

Extracting Elements from a Vector

> w <- c(-3, 20, 9, 2) > w[w > 3] ### Extract elements of w greater than 3

[1] 20 9

> ### What’s going on here?
> w > 3

[1] FALSE TRUE TRUE FALSE

> w[c(FALSE, TRUE, TRUE, FALSE)]

[1] 20 9

Gabriel Young Lecture 1: Intro to R September 10, 2021 100 / 128

Filtering examples

> w <- c(-3, 20, 9, 2) > ### Extract elements of w with squares between 3 and 10

> w[w*w >= 3 & w*w <= 10] [1] -3 2 > w*w >= 3 ### What’s going on here?

[1] TRUE TRUE TRUE TRUE

> w*w <= 10 [1] TRUE FALSE FALSE TRUE > w*w >= 3 & w*w <= 10 [1] TRUE FALSE FALSE TRUE Gabriel Young Lecture 1: Intro to R September 10, 2021 101 / 128 Filtering examples Extracting Elements from a Vector > w <- c(-1, 20, 9, 2) > v <- c(0, 17, 10, 1) > ### Extract elements of w greater than elements from v

> w[w > v]

[1] 20 2

> ### What’s going on here?
> w > v

[1] FALSE TRUE FALSE TRUE

> w[c(FALSE, TRUE, FALSE, TRUE)]

[1] 20 2

Gabriel Young Lecture 1: Intro to R September 10, 2021 102 / 128

Filtering examples

Filtering Elements of a Matrix

> M <- matrix(c(rep(4,5), 5:8), ncol=3, nrow=3) > M

[,1] [,2] [,3]

[1,] 4 4 6

[2,] 4 4 7

[3,] 4 5 8

> ### We can do element-wise comparisons with matrices too.

> M > 5

[,1] [,2] [,3]

[1,] FALSE FALSE TRUE

[2,] FALSE FALSE TRUE

[3,] FALSE FALSE TRUE

Gabriel Young Lecture 1: Intro to R September 10, 2021 103 / 128

Filtering examples

> M

[,1] [,2] [,3]

[1,] 4 4 6

[2,] 4 4 7

[3,] 4 5 8

> M[,3] < 8 [1] TRUE TRUE FALSE > M[M[,3] < 8, ] [,1] [,2] [,3] [1,] 4 4 6 [2,] 4 4 7 Gabriel Young Lecture 1: Intro to R September 10, 2021 104 / 128 Filtering examples Reassigning Elements of a Matrix > M

[,1] [,2] [,3]

[1,] 4 4 6

[2,] 4 4 7

[3,] 4 5 8

> ### Assign elements greater than 5 with zero

> M[M > 5] <- 0 > M

[,1] [,2] [,3]

[1,] 4 4 0

[2,] 4 4 0

[3,] 4 5 0

Gabriel Young Lecture 1: Intro to R September 10, 2021 105 / 128

Check your understanding

Using matrix z, what is the output of the following?

> z

First Second Third

[1,] 1 1 9

[2,] 2 0 16

[3,] 3 1 25

1. z[z[, “Second”], ]?

2. z[, 1] != 1?

3. z[(z[, 1] != 1), 3]?

Gabriel Young Lecture 1: Intro to R September 10, 2021 106 / 128

Check your understanding

> z

First Second Third

[1,] 1 1 9

[2,] 2 0 16

[3,] 3 1 25

Solutions

> z[z[, “Second”], ]

First Second Third

[1,] 1 1 9

[2,] 1 1 9

Gabriel Young Lecture 1: Intro to R September 10, 2021 107 / 128

Check your understanding

> z

First Second Third

[1,] 1 1 9

[2,] 2 0 16

[3,] 3 1 25

Solutions

> z[, 1] != 1

[1] FALSE TRUE TRUE

> z[(z[, 1] != 1), 3]

[1] 16 25

Gabriel Young Lecture 1: Intro to R September 10, 2021 108 / 128

A quick note

> z

First Second Third

[1,] 1 1 9

[2,] 2 0 16

[3,] 3 1 25

> z[(z[, 1] != 1), 3]

[1] 16 25

> z[(z[, 1] != 1), 3, drop = FALSE]

Third

[1,] 16

[2,] 25

Gabriel Young Lecture 1: Intro to R September 10, 2021 109 / 128

Section VIII

NA and NULL Values

Gabriel Young Lecture 1: Intro to R September 10, 2021 110 / 128

NA and NULL

• NA indicates a missing value in a dataset.

• NULL is a value that doesn’t exist and is often returned by expressions
and functions whose value is undefined.

Example

> length(c(-1, 0, NA, 5))

[1] 4

> length(c(-1, 0, NULL, 5))

[1] 3

Gabriel Young Lecture 1: Intro to R September 10, 2021 111 / 128

NA and NULL

Example

> ### Use na.rm = TRUE to remove NA values

> t <- c(-1,0,NA,5) > mean(t)

[1] NA

> mean(t, na.rm = TRUE)

[1] 1.333333

> ### NA values are missing, but NULL values don’t exist.
> s <- c(-1, 0, NULL, 5) > mean(s)

[1] 1.333333

Gabriel Young Lecture 1: Intro to R September 10, 2021 112 / 128

NA and NULL

NULL can be used is to build a vector in the following way:

> # Define an empty vector

> x <- NULL > # Fill in the vector

> x[1] <- "Blue" > x[2] <- "Green" > x[3] <- "Red" > x

[1] “Blue” “Green” “Red”

• NULL is commonly used to build vectors in loops with each iteration
adding another element.

• Filling in pre-allocated space is less expensive (computationally) than
adding an element at each step.

• Loops will be introduced next lecture.
Gabriel Young Lecture 1: Intro to R September 10, 2021 113 / 128

Section IX

Base R

Lists

Gabriel Young Lecture 1: Intro to R September 10, 2021 114 / 128

Lists

A list structure combines objects of different modes.

Recall, in vectors and matrices all elements must have the same mode.

To define a list:

• Use the function “list()”:

list(name1 = object1, name2 = object2, …)

List component names (called tags) are optional.

Gabriel Young Lecture 1: Intro to R September 10, 2021 115 / 128

Line of best fit example

Recall, the energy expenditure versus fat-free body mass example one
more time.

Example

To investigate the dependence of energy expenditure (y) on body build,
researches used underwater weighing techniques to determine the fat-free body
mass (x) for each of seven men. They also measured the total 24-hour energy
expenditure for each man during conditions of quiet sedentary activity. The
results are shown in the table.

Subject 1 2 3 4 5 6 7
x 49.3 59.3 68.3 48.1 57.61 78.1 76.1
y 1,894 2,050 2,353 1,838 1,948 2,528 2,568

Gabriel Young Lecture 1: Intro to R September 10, 2021 116 / 128

Lists

Let’s make a list of the values we’ve calculated for this example.

> # Combine data into single matrix

> data <- cbind(x, y) > # Summary values for x and y

> sum_x <- summary(x) > sum_y <- summary(y) > # We computed Sxy and Sxx previously

> est_vals <- c(Sxy/Sxx, mean(y) - Sxy/Sxx*mean(x)) Gabriel Young Lecture 1: Intro to R September 10, 2021 117 / 128 Lists > # Define a list with different objects for each element

> body_fat <- list(variable_data = data, + summary_x = sum_x, summary_y = sum_y, + LOBF_est = est_vals) Gabriel Young Lecture 1: Intro to R September 10, 2021 118 / 128 Extracting components of lists Extract an individual component “c” from a list called lst in the following ways: • lst$c • lst[[i]] where “c” is the i th component. • lst[["c"]] Gabriel Young Lecture 1: Intro to R September 10, 2021 119 / 128 Extracting components of lists Energy expenditure versus fat-free body mass example > # Extract the first list element

> body_fat[[1]]

x y

[1,] 49.30 1894

[2,] 59.30 2050

[3,] 68.30 2353

[4,] 48.10 1838

[5,] 57.61 1948

[6,] 78.10 2528

[7,] 76.10 2568

Gabriel Young Lecture 1: Intro to R September 10, 2021 120 / 128

Lists

Energy expenditure versus fat-free body mass example

> # Extract the Line of Best Fit estimates

> body_fat$LOBF_est

[1] 25.01184 607.65386

> # Extract the summary of x

> body_fat[[“summary_x”]]

Min. 1st Qu. Median Mean 3rd Qu. Max.

48.10 53.45 59.30 62.40 72.20 78.10

Gabriel Young Lecture 1: Intro to R September 10, 2021 121 / 128

Section X

List Extraction/Subsetting Continued

Double vs. Single Brackets

Gabriel Young Lecture 1: Intro to R September 10, 2021 122 / 128

List extraction: single versus double bracket

Compare body_fat[1] with body_fat[[1]]

> # Single bracket

> body_fat[1]

$variable_data

x y

[1,] 49.30 1894

[2,] 59.30 2050

[3,] 68.30 2353

[4,] 48.10 1838

[5,] 57.61 1948

[6,] 78.10 2528

[7,] 76.10 2568

Gabriel Young Lecture 1: Intro to R September 10, 2021 123 / 128

List extraction: single versus double bracket

Compare body_fat[1] with body_fat[[1]]

> # Double bracket

> body_fat[[1]]

x y

[1,] 49.30 1894

[2,] 59.30 2050

[3,] 68.30 2353

[4,] 48.10 1838

[5,] 57.61 1948

[6,] 78.10 2528

[7,] 76.10 2568

Gabriel Young Lecture 1: Intro to R September 10, 2021 124 / 128

List extraction: single versus double bracket

Compare body_fat[1] with body_fat[[1]]

> # Single bracket

> # Try and run the below code (uncomment it!)

> # body_fat[1][1:3,]

>

> # Double bracket

> body_fat[[1]][1:3,]

x y

[1,] 49.3 1894

[2,] 59.3 2050

[3,] 68.3 2353

What happened?

Gabriel Young Lecture 1: Intro to R September 10, 2021 125 / 128

List extraction: single versus double bracket

Figure 9: Great brackets description

Image taken from Hadley Wickham’s Twitter

Gabriel Young Lecture 1: Intro to R September 10, 2021 126 / 128

List extraction: single versus double bracket

One more example

> # Single bracket

> body_fat[“LOBF_est”]

$LOBF_est

[1] 25.01184 607.65386

> # Double bracket

> body_fat[[“LOBF_est”]]

[1] 25.01184 607.65386

> # Inside the pepper packet we can extract the slope

> body_fat[[“LOBF_est”]][1]

[1] 25.01184

Gabriel Young Lecture 1: Intro to R September 10, 2021 127 / 128

Set 1 finished

Thank You!

Gabriel Young Lecture 1: Intro to R September 10, 2021 128 / 128