CS计算机代考程序代写 data structure data science Data Structures::Vectors

Data Structures::Vectors
STT 180 Module 2 Lecture 1

Dola Pathak

Michigan State University

(Michigan State University) Introduction to Data Science 1 / 1

Learning Objectives

• Understand data structures in R

• Define the coercion hierarchy and, given a list of data formats, be
able to rank the items in the list according to the hierarchy.

• Understand vector recycling and vectorization property in R

• Extract information from data frames and lists.
• Be able to extract info from dfs and lists in more than one way.

• How to work with and transform data to give meaningful results.

(Michigan State University) Introduction to Data Science 2 / 1

Data structures

What is a data Structure?

• Format for organizing and storing data

• Designed in a specific way to help in accessibility and workability of
data.

• Most statistical softwares/programming languages can relate to these
structures.

(Michigan State University) Introduction to Data Science 3 / 1

Main structures

• Vectors

• Matrices

• Arrays

• Data frames

• Lists

All components of the first three structures must be homogenous in
variable type.

(Michigan State University) Introduction to Data Science 4 / 1

Vector creation and subsetting

Vectors can be created in R using the function c. Vectors are subset using
[ ] with appropriately indexed positions.

stocks <- c("AAPL", "OKE", "MSFT", "AA", "KO") stocks[3] [1] "MSFT" stocks[c(1, 4)] [1] "AAPL" "AA" (Michigan State University) Introduction to Data Science 5 / 1 Variable type • Character: use ‘ ’ or “ ” • Double • Integer • Logical • Complex • Raw: hold raw bytes of information Double and interger are often grouped together as numeric. (Michigan State University) Introduction to Data Science 6 / 1 Some examples # character c("first", "second", "third") [1] "first" "second" "third" # numeric c(pi, 4.2, 1, 0) [1] 3.141593 4.200000 1.000000 0.000000 # logical c(c(T, F, F, T), c(TRUE, FALSE)) [1] TRUE FALSE FALSE TRUE TRUE FALSE (Michigan State University) Introduction to Data Science 7 / 1 Some more examples # integer c(4L, 0L, 10L, -3L) [1] 4 0 10 -3 # complex c(4 - (0+3i), 0+0i, (0 + (0+1i))^2) [1] 4-3i 0+0i -1+0i (Michigan State University) Introduction to Data Science 8 / 1 More on variable types The function • typeof determines the type or storage mode of any object • is. tests for objects of

• as. creates objects of

should be replaced with one of the variable types.

(Michigan State University) Introduction to Data Science 9 / 1

Coercion hierachy

A vector can only be of one variable type, thus there is a coercion
hierarchy.

logical < integer < double < complex < character Values are converted to the simplest type required to represent all the information. (Michigan State University) Introduction to Data Science 10 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Coercion hierachy example c(4 + (0+3i), 3/4, TRUE, "abc") c(4 + (0+3i), 3/4, TRUE, "abc") [1] "4+3i" "0.75" "TRUE" "abc" c(4 + (0+3i), 3/4, TRUE) c(4 + (0+3i), 3/4, TRUE) [1] 4.00+3i 0.75+0i 1.00+0i (Michigan State University) Introduction to Data Science 11 / 1 Vector recycling x <- c(4, 5, 10, 0, 1, -5) y <- c(2, 9, 3) x + y [1] 6 14 13 2 10 -2 If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. (Michigan State University) Introduction to Data Science 12 / 1 Vector recycling x <- c(4, 5, 10, 0, 1, -5) y <- c(2, 9, 3) x + y [1] 6 14 13 2 10 -2 If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector. (Michigan State University) Introduction to Data Science 12 / 1 Vectorization Many operations in R are vectorized, meaning that operations occur in parallel in certain R objects. abs(c(-4, 5, -sqrt(4), -2:2)) [1] 4 5 2 2 1 0 1 2 sample(1:6, size = 4, replace = TRUE)^2 [1] 36 16 36 25 (Michigan State University) Introduction to Data Science 13 / 1 Key vector properties • Homogeneous • Indexed by position, begins at 1 • Indexed by multiple positions • Components can have names • Remember recycling and vectorization (Michigan State University) Introduction to Data Science 14 / 1