COMP2022 Programming for FinTech Applications
Spring 2020
Professor: Dr. Grace Wang
Week 5: R
1
1
Loop Functions
2
2
1
Looping on the Command Line
qlapply(): Loop over a list and evaluate a function on each element qsapply(): Same as lapply but try to simplify the result
qapply(): Apply a function over the margins of an array
qtapply(): Apply a function over subsets of a vector
qmapply(): Multivariate version of lapply
3
3
lapply()
qThe lapply() function does the following simple series of operations:
1. it loops over a list, iterating over each element in that list
2. it applies a function to each element of the list (a function that you specify)
3. and returns a list (the l is for ¡°list¡±).
qThis function takes three arguments
¡ì (1) a list X, If X is not a list, it will be coerced to a list using as.list(). ¡ì (2) a function (or the name of a function) FUN;
¡ì (3) other arguments via its … argument.
4
2
lapply()
qthe actual looping is done internally in C code for efficiency reasons.
> lapply
function (X, FUN, …)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
5
lapply()
qExample 1
> x <- list(a = 1:5, b = rnorm(10))
> lapply(x, mean)
> x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
> lapply(x, mean) qExample 2
> x <- 1:4
> lapply(x, runif)
When you pass a function to lapply(), lapply() takes elements of the list and passes them as the first argument of the function you are applying.
6
3
lapply()
qthe … Argument
> x <- 1:4
> lapply(x, runif, min = 0, max = 10)
qanonymous functions.
> x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) > lapply(x, function(elt) { elt[,1] })
V.S.
> f <- function(elt) { elt[, 1]
}
> lapply(x, f)
7
sapply()
qThe sapply() function behaves similarly to lapply(); the only real difference is in the return value.
qsapply() will try to simplify the result of lapply() if possible. Essentially, sapply() calls lapply() on its input and then applies the following algorithm
¡ì If the result is a list where every element is length 1, then a vector is returned ¡ì If the result is a list where every element is a vector of the same length (> 1),
a matrix is returned.
¡ì If it can¡¯t figure things out, a list is returned
> x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5)) > lapply(x, mean)
> sapply(x, mean)
8
4
split()
qThe combination of split() and a function like lapply() or sapply() is a common paradigm in R.
> library(datasets)
> head(airquality)
> s <- split(airquality, airquality$Month) > str(s)
> lapply(s, function(x) {
colMeans(x[, c(“Ozone”, “Solar.R”, “Wind”)])
})
> sapply(s, function(x) {
colMeans(x[, c(“Ozone”, “Solar.R”, “Wind”)]) })
> sapply(s, function(x) {
colMeans(x[, c(“Ozone”, “Solar.R”, “Wind”)],na.rm = TRUE)
})
9
tapply
qtapply() is used to apply a function over subsets of a vector. It can be thought of as a combination of split() and sapply() for vectors only.
> str(tapply)
function (X, INDEX, FUN = NULL, …, simplify = TRUE) qThe arguments to tapply() are as follows:
¡ì X is a vector
¡ì INDEX is a factor or a list of factors (or else they are coerced to factors) ¡ì FUN is a function to be applied
¡ì … contains other arguments to be passed FUN
¡ì simplify, should we simplify the result?
10
5
tapply
> ## Simulate some data
> x <- c(rnorm(10), runif(10), rnorm(10, 1))
> ## Define some groups with a factor variable
> f <- gl(3, 10)
>f
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
> tapply(x, f, mean)
> tapply(x, f, mean, simplify = FALSE)
> tapply(x, f, mean, simp = FALSE) #will it return the same? why
# when returning >1 value. tapply() will not simplify the result and # will return a list.
> tapply(x, f, range)
11
apply()
qEvaluate a function (often an anonymous one) over the margins of an array. qMost often, apply a function to the rows or columns of a matrix (which is just a 2-
dimensional array). Also, applicable to general arrays > str(apply)
function (X, MARGIN, FUN, …)
qThe arguments to apply() are ¡ì X is an array
¡ì MARGIN is an integer vector indicating which margins should be ¡°retained¡±. ¡ì FUN is a function to be applied
¡ì … is for other arguments to be passed to FUN
12
6
apply()
qExamples
> set.seed(0)
> x <- matrix(rnorm(200), 20, 10)
> apply(x, 2, mean) ## Take the mean of each column > apply(x, 1, sum) ## Take the mean of each row
> a <- array(rnorm(2 * 2 * 10), c(2, 2, 10))
> apply(a, c(1, 2), mean)
q Shortcuts
¡ì rowSums = apply(x, 1, sum)
¡ì rowMeans = apply(x, 1, mean) ¡ì colSums = apply(x, 2, sum)
¡ì colMeans = apply(x, 2, mean)
13
mapply()
qA multivariate apply of sorts which applies a function in parallel over a set of arguments.
qRecall that lapply() and friends only iterate over a single R object. What if you want to iterate over multiple R objects in parallel? This is what mapply() is for.
qThe arguments to mapply() are
¡ì FUN is a function to apply
¡ì contains R objects to apply over
¡ì MoreArgs is a list of other arguments to FUN.
¡ì SIMPLIFY indicates whether the result should be simplified
> str(mapply)
function (FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
14
7
mapply()
>list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1)) > mapply(rep, 1:4, 4:1)
> noise <- function(n, mean, sd) {
rnorm(n, mean, sd) }
> ## Simulate 5 random numbers
> noise(5, 1, 2)
[1] -0.5196913 3.2979182 -0.6849525 1.7828267 2.7827545
>
> ## This only simulates 1 set of numbers, not 5
> noise(1:5, 1:5, 2)
[1] -1.670517 2.796247 2.776826 5.351488 3.422804 > mapply(noise, 1:5, 1:5, 2)
15
Vectorizing a Function
qThe mapply() function can be used to automatically ¡°vectorize¡± a function: take a function that typically only takes single arguments and create a new function that can take vector arguments.
> sumsq <- function(mu, sigma, x) { sum(((x - mu) / sigma)^2)
}
> x <- rnorm(100) ## Generate some data
> sumsq(1:10, 1:10, x) ## This is not what we want
[1] 110.2594
However, we can do what we want to do by using mapply(). > mapply(sumsq, 1:10, 1:10, MoreArgs = list(x = x))
16
8
Vectorize()
qIt can automatically create a vectorized version of your function. qExample: create a vsumsq() function that is fully vectorized as follows.
> vsumsq <- Vectorize(sumsq, c("mu", "sigma"))
> vsumsq(1:10, 1:10, x)
[1] 196.2289 121.4765 108.3981 104.0788 102.1975 101.2393 100.6998 [8] 100.3745 100.1685 100.0332
17
Summary
qThe loop functions in R are very powerful because they allow you to conduct a series of operations on data using a compact form
qThe operation of a loop function involves iterating over an R object (e.g. a list or vector or matrix), applying a function to each element of the object, and the collating the results and returning the collated results.
qLoop functions make heavy use of anonymous functions, which exist for the life of the loop function but are not stored anywhere
qThe split() function can be used to divide an R object into subsets determined by another variable which can subsequently be looped over using loop functions.
18
9