Data Structures::Vectors
STT 180 Module 2 Lecture 1
Dola Pathak
Michigan State University
(Michigan State University) Introduction to Data Science 1 / 1
Learning Objectives
• Understand data structures in R
• Define the coercion hierarchy and, given a list of data formats, be
able to rank the items in the list according to the hierarchy.
• Understand vector recycling and vectorization property in R
• Extract information from data frames and lists.
• Be able to extract info from dfs and lists in more than one way.
• How to work with and transform data to give meaningful results.
(Michigan State University) Introduction to Data Science 2 / 1
Data structures
What is a data Structure?
• Format for organizing and storing data
• Designed in a specific way to help in accessibility and workability of
data.
• Most statistical softwares/programming languages can relate to these
structures.
(Michigan State University) Introduction to Data Science 3 / 1
Main structures
• Vectors
• Matrices
• Arrays
• Data frames
• Lists
All components of the first three structures must be homogenous in
variable type.
(Michigan State University) Introduction to Data Science 4 / 1
Vector creation and subsetting
Vectors can be created in R using the function c. Vectors are subset using
[ ] with appropriately indexed positions.
stocks <- c("AAPL", "OKE", "MSFT", "AA", "KO")
stocks[3]
[1] "MSFT"
stocks[c(1, 4)]
[1] "AAPL" "AA"
(Michigan State University) Introduction to Data Science 5 / 1
Variable type
• Character: use ‘ ’ or “ ”
• Double
• Integer
• Logical
• Complex
• Raw: hold raw bytes of information
Double and interger are often grouped together as numeric.
(Michigan State University) Introduction to Data Science 6 / 1
Some examples
# character
c("first", "second", "third")
[1] "first" "second" "third"
# numeric
c(pi, 4.2, 1, 0)
[1] 3.141593 4.200000 1.000000 0.000000
# logical
c(c(T, F, F, T), c(TRUE, FALSE))
[1] TRUE FALSE FALSE TRUE TRUE FALSE
(Michigan State University) Introduction to Data Science 7 / 1
Some more examples
# integer
c(4L, 0L, 10L, -3L)
[1] 4 0 10 -3
# complex
c(4 - (0+3i), 0+0i, (0 + (0+1i))^2)
[1] 4-3i 0+0i -1+0i
(Michigan State University) Introduction to Data Science 8 / 1
More on variable types
The function
• typeof determines the type or storage mode of any object
• is.
• as.
(Michigan State University) Introduction to Data Science 9 / 1
Coercion hierachy
A vector can only be of one variable type, thus there is a coercion
hierarchy.
logical < integer < double < complex < character Values are converted to the simplest type required to represent all the information. (Michigan State University) Introduction to Data Science 10 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Vector recycling x <- c(4, 5, 10, 0, 1, -5) y <- c(2, 9, 3) x + y [1] 6 14 13 2 10 -2 If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. (Michigan State University) Introduction to Data Science 12 / 1 Vector recycling x <- c(4, 5, 10, 0, 1, -5) y <- c(2, 9, 3) x + y [1] 6 14 13 2 10 -2 If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. (Michigan State University) Introduction to Data Science 12 / 1 Vectorization Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects. abs(c(-4, 5, -sqrt(4), -2:2)) [1] 4 5 2 2 1 0 1 2 sample(1:6, size = 4, replace = TRUE)^2 [1] 36 16 36 25 (Michigan State University) Introduction to Data Science 13 / 1 Key vector properties • Homogeneous • Indexed by position, begins at 1 • Indexed by multiple positions • Components can have names • Remember recycling and vectorization (Michigan State University) Introduction to Data Science 14 / 1