Data Structures::Data frames and lists
STT 180 Module 2 Lecture 2
Dola Pathak
Michigan State University
(Michigan State University) Introduction to Data Science 1 / 1
Recall: Main structures
• Vectors
• Matrices
• Arrays
• Data frames
• Lists
All components of the first three structures must be homogenous in
variable type.
(Michigan State University) Introduction to Data Science 2 / 1
Key vector properties
• Homogeneous
• Indexed by position, begins at 1
• Indexed by multiple positions
• Components can have names
• Remember recycling and vectorization
(Michigan State University) Introduction to Data Science 3 / 1
Data frame properties
A tabular structure with rows and columns.
• columns are vectors for a variable
• all vectors are of the same length
• columns have names
To access the contents of a data frame, use $ or matrix-style notation
(Michigan State University) Introduction to Data Science 4 / 1
Select data frame colums by position
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars[, 3]
[1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
[12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
[23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
(Michigan State University) Introduction to Data Science 5 / 1
Select data frame colums by position
head(mtcars[3])
disp
Mazda RX4 160
Mazda RX4 Wag 160
Datsun 710 108
Hornet 4 Drive 258
Hornet Sportabout 360
Valiant 225
(Michigan State University) Introduction to Data Science 6 / 1
Select data frame colums by name
mtcars$disp
[1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
[12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
[23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
mtcars$hp
[1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230
[18] 66 52 65 97 150 150 245 175 66 91 113 264 175 335 109
(Michigan State University) Introduction to Data Science 7 / 1
Select data frame colums by multiple position
head(mtcars[, c(4, 7)])
hp qsec
Mazda RX4 110 16.46
Mazda RX4 Wag 110 17.02
Datsun 710 93 18.61
Hornet 4 Drive 110 19.44
Hornet Sportabout 175 17.02
Valiant 105 20.22
head(mtcars[c(4, 7)])
hp qsec
Mazda RX4 110 16.46
Mazda RX4 Wag 110 17.02
Datsun 710 93 18.61
Hornet 4 Drive 110 19.44
Hornet Sportabout 175 17.02
Valiant 105 20.22
(Michigan State University) Introduction to Data Science 8 / 1
Select data frame colums using subset function
mtcars.filter <- subset(mtcars, select = c("mpg", "cyl", "hp")) head(mtcars.filter) mpg cyl hp Mazda RX4 21.0 6 110 Mazda RX4 Wag 21.0 6 110 Datsun 710 22.8 4 93 Hornet 4 Drive 21.4 6 110 Hornet Sportabout 18.7 8 175 Valiant 18.1 6 105 (Michigan State University) Introduction to Data Science 9 / 1 More on data frames • Function names can be used to change variable names • Function rownames can be used to change/provide row names • To subset rows, specify row indices before the comma mtcars[c(1, 4, 9), 3:5] disp hp drat Mazda RX4 160.0 110 3.90 Hornet 4 Drive 258.0 110 3.08 Merc 230 140.8 95 3.92 (Michigan State University) Introduction to Data Science 10 / 1 Lists Lists are heterogeneous and allow for a hierarchical data structure. my.list <- list(ltrs = letters[1:6], cars = mtcars[1:4, 1:2], value = 5) my.list $ltrs [1] "a" "b" "c" "d" "e" "f" $cars mpg cyl Mazda RX4 21.0 6 Mazda RX4 Wag 21.0 6 Datsun 710 22.8 4 Hornet 4 Drive 21.4 6 $value [1] 5 (Michigan State University) Introduction to Data Science 11 / 1 Lists To access the contents of a list use • double square brackets or $ to drill into the list • single square brackets to return a subsetted list as a list (Michigan State University) Introduction to Data Science 12 / 1 List examples my.list[[1]] [1] "a" "b" "c" "d" "e" "f" my.list[1] $ltrs [1] "a" "b" "c" "d" "e" "f" (Michigan State University) Introduction to Data Science 13 / 1 List examples my.list[c(1, 3)] $ltrs [1] "a" "b" "c" "d" "e" "f" $value [1] 5 (Michigan State University) Introduction to Data Science 14 / 1 List examples my.list$cars mpg cyl Mazda RX4 21.0 6 Mazda RX4 Wag 21.0 6 Datsun 710 22.8 4 Hornet 4 Drive 21.4 6 my.list[["value"]] [1] 5 (Michigan State University) Introduction to Data Science 15 / 1 More on lists • Use unlist to flatten a list into a vector • Use NULL to remove list components, my.list[["ltrs"]] <- NULL • Create a list with list (Michigan State University) Introduction to Data Science 16 / 1