Learning Objectives
1 The Essentials
Stats 20: Introduction to Statistical Programming with R UCLA
1.1 BasicDefinitions ……………………………………. 2 1.2 TheLengthofaVector ………………………………… 3 1.3 TheModeHierarchy………………………………….. 3
Copyright By PowCoder代写 加微信 powcoder
2 Sequences and Repeated Patterns 4
2.1 Theseq()Function ………………………………….. 4 2.2 Therep()Function ………………………………….. 6
3 Extracting and Assigning Vector Elements 7
3.1 Subsetting……………………………………….. 7 3.2 AssigningValuestoanExistingVector…………………………. 9
4 Vector Arithmetic 10
4.1 Recycling ……………………………………….. 11
5 Vectorization 12
5.1 Thevapply()Function ………………………………… 13
6 Basic Numeric Summary Functions 14
6.1 Built-InFunctions …………………………………… 14 6.2 Example:CodingaVarianceFunction …………………………. 15
7 Technical Subtleties 16
7.1 SpecialValues……………………………………… 16 7.2 ApproximateStorageofNumbers……………………………. 18
All rights reserved, , 2017–2022. Acknowledgements: and
Do not post, share, or distribute anywhere or with anyone without explicit permission.
Learning Objectives
After studying this chapter, you should be able to:
• Combine vectors with the c() function.
• Understand the distinction between numeric, character, and logical vectors.
• Extract values from a vector using subsetting.
• Compute vector arithmetic in R.
• Understand how R uses recycling in vector operations.
• Understand how R uses vectorization.
• Understand that R approximates numbers to identify and be aware of rounding errors.
1 The Essentials 1.1 Basic Definitions
The most fundamental object in R is a vector, which is an ordered collection of values. The entries of a vector are also called elements or components. Single values (or scalars) are actually just vectors with a single element.
The possible values contained in a vector can be of several basic data types, also known as (storage) modes: numeric, character, or logical.
• Numeric values are numbers (decimals).
• Character values (also called strings) are letters, words, or symbols. Character values are always
contained in quotation marks “”.
• Logical values are either TRUE or FALSE (must be in all caps), representing true and false values in
formal logical statements.
Note: The (capital) letters T and F are technically valid shorthand for TRUE and FALSE, respectively, but you
should never use them.
The c() function is used to collect values into a vector. The c stands for concatenate or combine. Here are a
few examples:
c(1, 1, 2, 3, 5, 8, 13) # This is a numeric vector [1] 1 1 2 3 5 8 13
[1] 1 1 2 3 5 8 13
[1] “Leslie” “April” “Ron” “Tom” “Donna” “Jerry”
[1] TRUE FALSE TRUE TRUE FALSE
The c() function can also concatenate vectors together by inputting vectors instead of single values.
fib <- c(1, 1, 2, 3, 5, 8, 13) # Assign the vector to a named object fib
parks <- c("Leslie", "April", "Ron", "Tom", "Donna", "Jerry") # This is a character vector parks
true_dat <- c(TRUE, FALSE, TRUE, T, F) # This is a logical vector true_dat
c(c(1, 2), c(3, 4, 5)) # Can concatenate multiple vectors together [1] 1 2 3 4 5
1.2 The Length of a Vector
The length of a vector is the number of elements in the vector. The length() function inputs a vector and outputs the length of the vector.
length(4) # A scalar/number is a vector of length 1 [1] 1
length(fib)
length(parks)
length(true_dat)
1.3 The Mode Hierarchy
In the examples above, we have created separate numeric, character, and logical vectors, where all the values in each vector are of the same type. A natural question is whether we can create a vector with mixed types.
It turns out that the answer is no: Due to how R (internally) stores vectors, every value in a vector must have the same type.
The mode() function inputs an object and outputs the type (or mode) of the object. This is a general function that can be applied to all objects, not just vectors.
[1] "numeric"
mode(parks)
[1] "character"
mode(true_dat)
[1] "logical"
When values of different types are concatenated into a single vector, the values are coerced into a single type.
Question: What is the output for the following commands? • mode(c(fib, parks))
• mode(c(fib, true_dat))
• mode(c(parks, true_dat))
• mode(c(fib, parks, true_dat))
These questions highlight the mode hierarchy:
logical < numeric < character
• Combining logical and numeric vectors will result in a numeric vector.
• Combining numeric and character vectors will result in a character vector.
• Combining logical and character vectors will result in a character vector.
• Combining logical, numeric, and character vectors will result in a character vector.
Note: When logical values are coerced into numeric values, TRUE becomes 1 and FALSE becomes 0.
The reason why knowing the types of our R objects is important is because we want to apply functions to the objects in order to describe, visualize, or do analysis on them. Just like in mathematics, functions in R are all about input and output. Functions will expect inputs (arguments) in a certain form and will give outputs in other forms, such as other R objects (vectors, matrices, data frames, etc.) or plots. In addition, some functions will change their output depending on the input.
As you become more familiar with R, it is important to know what type your objects are and what functions are available to you for working with those objects.
2 Sequences and Repeated Patterns
R has some handy built-in functions for creating vectors of sequential or repeated values. One common use of sequences in statistics is to generate the category labels (or levels) for a designed experiment.
2.1 The seq() Function
The seq() function creates a sequence of evenly spaced numbers with specified start and end values. The start and end values determine whether the sequence is increasing or decreasing. The first argument is the from or starting value, and the second argument is the to or end value. By default, the optional argument by is set to by = 1, which means the numbers in the sequence are incrementally increased by 1.
seq(0, 5) # numbers increase by 1 [1] 0 1 2 3 4 5
seq(0, 10, by = 2) # numbers now increase by 2 [1] 0 2 4 6 8 10
The seq() function can make decreasing sequences by specifying the from argument to be greater than the to argument. By default, the by argument will automatically change to by = -1.
seq(5, 0) # seq() can also make decreasing sequences [1] 5 4 3 2 1 0
seq(10, 0, by = -3) # numbers now decrease by 3
[1] 10 7 4 1
Notice that seq(10, 0, by = -3) stops at the smallest number in the sequence greater than the to argument.
To obtain a sequence of numbers of a given length, use the optional length (or length.out) argument. The incremental increase (or decrease) will be calculated automatically.
seq(0, 1, length = 11)
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
We could also specify the increment and length instead of providing the end value.
seq(10, 55, length = 10)
[1] 10 15 20 25 30 35 40 45 50 55
seq(10, by = 5, length = 10) # The same sequence [1] 10 15 20 25 30 35 40 45 50 55
2.1.1 Shorthands for Common Sequences
R has several shorthands for common sequences. We will discuss the colon : operator, seq_len(), and seq_along().
2.1.1.1 The Colon : Operator increment (i.e., by = 1 or -1).
-2:5 # same as seq(-2, 5) [1] -2 -1 0 1 2 3 4 5
The colon : operator is a shorthand for the default seq() with unit
pi:10 # same as seq(pi, 10)
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
Caution: The colon : operator takes precedence over multiplication and subtraction in the order of operations, but it does not take precedence over exponents. It is always recommended to use parentheses to make the order of operations explicit.
[1] 0 1 2 3 4
[1] 1 2 3 4
2.1.1.2 The seq_len() Function The seq_len() function inputs a single length.out argument and generates the sequence of integers 1, 2, ..., length.out unless length.out = 0, when it generates integer(0).
seq_len(8)
[1] 1 2 3 4 5 6 7 8
seq_len(10)
[1] 1 2 3 4 5 6 7 8 9 10
seq_len(0)
integer(0)
Notice that the output of 1:n and seq_len(length.out = n) are the same for positive integers n. However, if n = 0, then seq_len(n) will generate integer(0), whereas 1:n will produce the vector 1, 0, which is often not the intended behavior when using the 1:n notation (especially when used inside of functions). In addition, seq_len() does not allow for negative inputs.
n <- 5 1:n - 1
seq_len(-5)
Error in seq_len(-5): argument must be coercible to non-negative integer
Using seq_len(n) rather than the shorter 1:n helps prevent unexpected results if n is incorrectly specified. When creating an integer sequence of possibly variable length, the seq_len() notation is recommended best practice over the colon operator :.
2.1.1.3 The seq_along() Function The seq_along() function inputs a single along.with argument and generates the sequence of integers 1, 2, ..., length(along.with).
seq_along(100)
seq_along(c(1, 3, 5, 7, 9))
[1] 1 2 3 4 5
seq_along(c("friends", "waffles", "work"))
The seq_along() function can be useful for generating a sequence of indices for the input vector, which will be helpful when writing loops (as we will see in a later chapter).
2.2 The rep() Function
The rep() function creates a vector of repeated values. The first argument, generically called x, is the vector of values we want to repeat. The second argument is the times argument that specifies how many times we want to repeat the values in the x vector.
The times argument can be a single value (repeats the whole vector) or a vector of values (repeats each individual value separately). If the length of the times vector is greater than 1, the vector needs to have the same length as the x vector. Each element of times correponds to the number of times to repeat the corresponding element in x.
rep(3, 10) # Repeat the value 3, 10 times [1] 3 3 3 3 3 3 3 3 3 3
rep(c(1, 2), 5) # Repeat the vector c(1,2), 5 times [1] 1 2 1 2 1 2 1 2 1 2
rep(c(1, 2), c(4, 3)) # Repeat the value 1, 4 times, and the value 2, 3 times [1] 1 1 1 1 2 2 2
rep(c(5, 3, 1), c(1, 3, 5)) # Repeat c(5,3,1), c(1,3,5) times
[1] 5 3 3 3 1 1 1 1 1
Question: How is rep(c(1, 2), 5) different from rep(c(1, 2), c(5, 5))? Question: Why does rep(c(5, 3, 1), c(1, 3)) give an error?
We can also combine seq() and rep() to construct more interesting patterns. rep(seq(2, 20, by = 2), 2)
[1] 2 4 6 8101214161820 2 4 6 8101214161820
rep(seq(2, 20, by = 2), rep(2, 10))
[1] 2 2 4 4 6 6 8 8101012121414161618182020
Note: The rep() function works with vectors of any mode, including character and logical vectors. This is particularly useful for creating vectors that represents categorical variables.
rep(c("long", "short"), c(2, 3))
[1] "long" "long" "short" "short" "short"
rep(c(TRUE, FALSE), c(6, 4))
[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
3 Extracting and Assigning Vector Elements 3.1 Subsetting
Square brackets are used to extract specific parts of data from objects in R. Extracting data this way is also called subsetting. We input the index of the element(s) we want to extract.
To illustrate subsetting, we will consider the following example.
To keep his body in (literally perfect) shape, runs 10k every day. His running times (in minutes) for the last ten days are:
51, 40, 57, 34, 47, 50, 50, 56, 41, 38
We first input the data into R as a vector and save it as the running_times object.
[1] 51 40 57 34 47 50 50 56 41 38
3.1.1 Positive Indices
Recall that the [1] in front of the output is an index, telling us that 51 is the first element of the vector running_times. By counting across the vector, we can see, for example, that the 5th element of running_times is 47. More efficiently, we can extract just the 5th element by typing running_times[5].
running_times[5] # Extract the 5th element [1] 47
To extract multiple values at once, we can input a vector of indices:
running_times[c(3, 7)] # Extract the 3rd and 7th elements [1] 57 50
running_times[4:8] # Extract the 4th through 8th elements [1] 34 47 50 50 56
# Input the data into R
running_times <- c(51, 40, 57, 34, 47, 50, 50, 56, 41, 38) # Print the values
running_times
Reordering the indices will reorder the elements in the vector:
running_times[8:4] # Output the 4th through 8th elements in reverse order [1] 56 50 50 47 34
3.1.2 Negative Indices
Negative indices allow us to avoid certain elements, extracting all elements in the vector except the ones with negative indices.
running_times[-4] # Output all elements except the 4th one [1] 51 40 57 47 50 50 56 41 38
running_times[-c(1, 5)] # Output all elements except the 1st and 5th [1] 40 57 34 50 50 56 41 38
running_times[-(1:4)] # Output all elements except the first four
[1] 47 50 50 56 41 38
Note: Notice that -(1:4) is not the same as -1:4.
Using a zero index outputs nothing. A zero index is not commonly used, but it can be useful to know for more complicated expressions.
[1] 51 40 57 34 47
Caution: Do not mix positive and negative indices. running_times[c(-1, 3)]
Error in running_times[c(-1, 3)]: only 0's may be mixed with negative subscripts
The issue with indices of mixed signs is that R does not know the order in which the subsetting should occur: Do we want to output the third element before or after removing the first one?
Question: How could we code outputting the third element of running_times after removing the first one? 3.1.3 Fractional Indices
Always use integer valued indices. Fractional indices will be truncated towards 0.
running_times[1.9] # Outputs the 1st element (1.9 truncated to 1) [1] 51
running_times[-1.9] # Outputs everything except the 1st element (-1.9 truncated to -1) [1] 40 57 34 47 50 50 56 41 38
running_times[0.5] # Outputs an empty vector (0.5 truncated to 0) numeric(0)
Note: The output numeric(0) is a numeric vector of length zero.
index_vector <- 0:5 # Create a vector of indices
running_times[index_vector] # Extract the values corresponding to the index.vector
3.1.4 Blank Indices
Subsetting with a blank index will output everything.
running_times
[1] 51 40 57 34 47 50 50 56 41 38
running_times[] # Same output
[1] 51 40 57 34 47 50 50 56 41 38
Blank indices will be important later (when we have ordered indices).
3.2 Assigning Values to an Existing Vector
Suppose made a mistake in recording his running times. On his fourth run, he ran 10k in 43 minutes, not 34 minutes. Rather than reentering all of his running times, how can we modify the existing running_times vector?
R allows us to assign new values to existing vectors by again using the assignment operator <-. Rather than specifying a new object name on the left of the assignment, we can put the element or elements in the named vector that we want to change.
[1] 51 40 57 34 47 50 50 56 41 38
[1] 51 40 57 43 47 50 50 56 41 38
If Chris found that the last two values were also incorrect, we can reassign multiple values at once using vector indices.
[1] 51 40 57 43 47 50 50 56 42 37
Note: The original value of 34 in the running_times vector has been overwritten, so reassigning values to an existing object is irreversible. Depending on the situation, it might be beneficial to first make a copy of the original data as a separate object before making changes This ensures that the original data is still retrievable if there is a mistake in the modifications.
Caution: You cannot use this syntax to create a new object. For example, the following code will not work: bad[1:2] <- c(4, 8)
Error in bad[1:2] <- c(4, 8): object 'bad' not found
The reason why this gives an error is that extracting or assigning individual vector elements using square brackets is actually done through functions (remember: everything is a function call). R cannot apply the extract/assign function to a vector that does not exist. The vector needs to be created first.
# Display 's running times
running_times
# Assign 43 to the 4th element of the running_times vector
running_times[4] <- 43
# Verify that the running_times vector has been updated running_times
# Assign 42 to the 9th element and 37 to the 10th element
running_times[9:10] <- c(42, 37)
# Verify that the running_times vector has been updated running_times
The following code fixes the issue:
Note: The numeric(), character(), and logical() functions can create empty vectors of a specified length
for their respective modes. The default elements will all be 0, "", and FALSE, respectively. numeric(3) # Create a numeric vector of length 3
character(5) # Create a character vector of length 5 [1] "" "" "" "" ""
logical(4) # Create a logical vector of length 4 [1] FALSE FALSE FALSE FALSE
Creating empty or blank vectors will be important when working with for and while loops.
4 Vector Arithmetic
Arithmetic can be done on numeric vectors using the usual arithmetic operations. The operations are applied elementwise, i.e., to each individual element.
For example, if we want to convert ’s running times from minutes into hours, we can divide all of the elements of running_times by 60.
[1] 0.8500000 0.6666667 0.9500000 0.7166667 0.7833333 0.8333333 0.8333333
[8] 0.9333333 0.7000000 0.6166667
Here are some other examples:
[1]-4-3-2-1 0 1 2 3 4 5
[1] 1 4 9 16 25 36 49 64 81 100
Arithmetic operations can also be applied between two vectors. Just like with scalars, the binary operators work element-by-element.
For example:
good <- numeric(2) # Create an empty vector of length 2 good[1:2] <- c(4, 8)
# Divide the running times by 60
running_times_in_hours <- running_times / 60 # Print the running_times_in_hours vector running_times_in_hours
# Create a vector of the integers from 1 to 10
first_ten <- 1:10
# Subtract 5 from each element first_ten - 5
# Square each element
first_tenˆ2
x <- c(1, 3, 5) # Create a sample x vector y <- c(2, 4, 3) # Create a sample y vector
# Add x and y
[1] 2 12 15
[1] 181125
Symbolically, if x = (x1, x2, x3) and y = (y1, y2, y3) are vectors, then vector arithmetic in R would output: • x+y=(x1 +y1,x2 +y2,x3 +y3)
• x−y=(x1 −y1,x2 −y2,x3 −y3)
• x∗y=(x1 ∗y1,x2 ∗y2,x3 ∗y3)
• x/y = (x1/y1, x2/y2, x3/y3) • xy = (xy1,xy2,xy3)
# Multiply x and y
# Exponentiate x by y
Side Note: This is not how vector operations work in vector calculus or linear algebra. In those fields, only addition and subtraction can be applied between vectors. Standard multiplication, division, and exponentiation do not make sense.
4.1 Recycling
When applying arithmetic operations to two vectors of different lengths, R will automatically recycle, or repeat, the shorter vector until it is long enough to match the longer vector.
For example:
c(1, 3, 5) + c(5, 7, 0, 2, 9, 11) [1] 610 5 31216
c(1, 3, 5, 1, 3, 5) + c(5, 7, 0, 2, 9, 11) # This is the same computation that R did [1] 610 5 31216
The basic arithmetic involving a vector and a scalar (i.e., a vector of length one) is implicitly using recycling.
c(1, 3, 5) + 5 [1] 6 810
c(1, 3, 5) + c(5, 5, 5) # This is the computation that R did [1] 6 810
Caution: When the length of the longer vector is a multiple of the length of the smaller one, R does not give any indication that it needed to recycle the shorter vector. It is up to the user to know how the operation is interpreted by R.
If the length of the longer vector is not a multiple of the length of the smaller one, the operation will still be executed, but R will also output a warning. The warning is meant to alert the user in case the
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com