COMP2022 Programming for FinTech Applications
Spring 2020
Professor: Dr. Grace Wang
Week 4: R
1
1
What is R?
qStatistical computer language similar to S-plus qInterpreted language (like Matlab)
qHas many built-in (statistical) functions
qEasy to build your own functions
qGood graphic displays qExtensive help files
2
2
1
Strengths and Weakness
qStrengths
¡ì Many built-in functions
¡ì Can get other functions from the internet by downloading libraries ¡ì Relatively easy data manipulations
qWeakness
¡ì Not as commonly used by non-statisticians
¡ì Not a compiled language, language interpreter can be very slow, but
allows to call own C/C++ code
3
3
How to use/learn R?
qHow
¡ì Install and Use Rstudio IDE
¡ì Getting started with R (Basic grammars)
¡ì Get to use/learn those popular packages
¡ì Do (a lot of) practices including real projects
4
4
2
Install RStudio
qAn integrated development environment (IDE) available for R ¡ì a nice editor with syntax highlighting
¡ì there is an R object viewer
¡ì there are many other nice features that are integrated
5
Starting and stopping R
qStarting
¡ì Windows/Mac: Double click on the R icon
¡ì Unix/Linux: type R (or the appropriate path on your machine)
qStopping ¡ìType q()
¡ìq()is a function execution
¡ì Everything in R is a function
¡ì q merely returns the content of the function
6
6
3
Writing R code
qCan input lines one at a time into R
qCan write many lines of code in any of your favorite text editors (including Rstudio) and run all at once
¡ì Simply paste the commands into R
¡ì Use function source(¡°path/yourscript¡±), to run in batch mode the codes saved in file ¡°yourscript¡±
7
7
R as a Calculator
0 20 40 60 80 100 Index
> log2(32)
[1] 5
> sqrt(2)
[1] 1.414214
> seq(0, 5, length=6)
[1] 0 1 2 3 4 5
> plot(sin(seq(0,
2*pi, length=100)))
8
8
4
sin(seq(0, 2 * pi, length = 100))
-1.0 -0.5 0.0 0.5 1.0
Recalling Previous Commands
qUse the arrow up key or the history command under the menus
qGiven the history window, one can copy certain commands or else past them into the console window
9
9
Language layout
qThree types of statement
¡ìexpression: it is evaluated, printed, and the value is lost (3+5)
¡ìassignment: passes the value to a variable but the result is not printed automatically (out<-3+5)
¡ìcomment: (#This is a comment)
10
10
5
Naming conventions
qAny roman letters, digits, underline, and ¡®.¡¯ (non-initial position) qAvoid using system names: c, q, s, t, C, D, F, I, T, diff, mean, pi, range,
rank, tree, var
qHold for variables, data and functions qVariable names are case sensitive
11
11
Arithmetic operations and functions
qMost operations in R are similar to Excel and calculators qBasic: +(add), -(subtract), *(multiply), /(divide) qExponentiation: ^
qRemainder or modulo operator: %%
qMatrix multiplication: %*%
qsin(x), cos(x), cosh(x), tan(x), tanh(x), acos(x), acosh(x), asin(x), asinh(x), atan(x),
atan(x,y), atanh(x)
qabs(x), ceiling(x), floor(x)
qexp(x), log(x, base=exp(1)), log10(x), sqrt(x), trunc(x) (the next integer closer to zero)
qmax(), min()
12
12
6
Defining new variables
q Assignment symbol, use ¡°<-¡± (or =)
q Scalars >scal<-6
>value<-7
qVectors; using c() to enter data >whales<-c(74,122,235,111,292,111,211,133,16,79) >simpsons<-c("Homer", "Marge", "Bart", "Lisa", "Maggie")
qFactors: Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers.
>pain<-c(0,3,2,2,1)
>fpain<-factor(pain,levels=0:3) >levels(fpain)<-c("none", "Mild", "medium", "severe")
13
13
Use functions on a vector
qMost functions work on vectors exactly as we would want them to do >sum(whales)
>length(whales)
>mean(whales)
¡ì sort(), min(), max(), range(), diff(), cumsum()
qVectorization of (arithmetic) functions >whales + whales
>whales – mean(whales)
¡ì Other arithmetic funs: sin(), cos(), exp(), log(), ^, sqrt(), sd() ¡ì Example: calculate the standard deviation of whales
14
14
7
Functions that create vectors
q Simple sequences >1:10
>rev(1:10) >10:1
>c(1:10, 10:1)
>fractions(1/(2:10)) >library(MASS) #to have fractions()
q Arithmetic sequence
¡ì a+(n-1)*h: how to generate 1, 3, 5, 7, 9? >a=1; h=2; n=5 OR
>a+h*(0:(n-1))
>seq(1,9,by=2) >seq(1,9,length=5)
q Repeated numbers >rep(1,10)
>rep(1:2, c(10,15))
qgetting help: ?rep or help(rep) qhelp.search(¡°keyword¡±) or ??keyword
15
15
Matrix
qThere are several ways to make a matrix
qTo make a 2×3 (2 rows, 3 columns) matrix of 0¡¯s:
>mat<-matrix(0,2,3)
qTo make the following matrix:
>mat2<-rbind(c(71,172),c(73,169),c(69,160),c(65,130)) >mat3<-cbind(c(71,73,69,65),c(172,169,160,130))
qTo make the following matrix: mat4<-matrix(1:10,2,5, byrow=T)
71
172
73
169
69
160
65
130
1
2
3
4
5
6
7
8
9
10
16
16
8
Accessing data by using indices
qAccessing individual observations >whales[2]
qSlicing >whales[2:5]
qNegative indices >whales[-1]
qLogical values >whales[whales>100] >which(whales>100) >which.max(whales)
17
17
Indexing of vector/matrix
qx=1:10
mat=matrix(1:24, nrow=4)
mat[,2] # 2nd column
mat[2,] # 2nd row
mat[c(2,4),] # 2nd and 4th row
mat[1:3,1] # 1 to 3 element in column 1 mat[-c(2,4),] # all but row 2 and 4
18
18
9
Create logical vectors by conditions
qLogical operators: <, <=, >, >=, ==, !=
qComparisons
¡ì Vectors: AND &; OR |
¡ì Longer forms &&, ||: return a single value ¡ì all() and any()
qExamples ¡ì X=1:5
¡ì X<5; X>1
¡ì X >1 & X <5; X >1 | X <5;
¡ì all(X<5); any(X>1); all(X<5) && any(X>1)
q%in% operator: x %in% c(2,4)
19
19
Missing values
qR codes missing values as NA
qis.na(x) is a logical function that assigns a T to all values that are NA
and F otherwise
>x[is.na(x)]<-0 >mean(x, na.rm=TRUE)
20
20
10
Reading in other sources of data
qUse R¡¯s built-in libraries and data sets >range(lynx) #lynx is a built-in dataset
>library(MASS) # load a library
>data(survey) # load a dataset in the library >data(survey, package=”MASS”)#load just data >head(survey)
>tail(survey)
qCopy and paste by scan()
>whales=scan()
1: 74 122 235 111 292 111 211 133 156 79 11:
Read 10 items
21
21
Read formatted data
qRead data from formatted data files, e.g. a file of numbers from a single file, a table of numbers separated by space, comma, tab etc, with or without header
>whale=scan(file=”whale.txt”)
¡°whale.txt¡±:
74 122 235 111 292 111 211 133 156 79 >whale=read.table(file=”whale.txt”, header=TRUE) >read.table(file=file.choose()) # specify the file
>read.table(file=”http://statweb.stanford.edu/~rag/stat141/exs/whale.txt”,header =T) # read from internet
22
22
11
Data frame
qA ¡°data matrix¡± or a ¡°data set¡±
¡ìit likes a matrix (rectangular grid)
¡ìBut unlike matrix, different columns can be of different types ¡ìRow names have to be unique
q>alphabet<-data.frame(index=1:26, symbol=LETTERS) qread.table() stores data in a data frame
23
23
Lists
qA larger composite object for combining a collection of objects ¡ì Different from data frame, each object can be of different length, in
additional to being of different types
>a=list(whales=c(74,122,235,111,292,111,211,133,16,79), simpsons=c(“Homer”, “Marge”, “Bart”, “Lisa”, “Maggie”))
¡ì Access by $ or [[]]: a$simpsons or a[[2]]
24
24
12
Manage the work environment
qWhat if there are more variables defined than can be remembered?
qls() list all the objects(var, fun, etc) in a given environment
qrm(a, b): delete variables a and b ¡ì rm(list=ls()) will ?
qGet and set working directory >getwd()
>setwd(“working/directory/path”)
qSave and load working environment >save.image(file=”filename.RData”) >load(file=”filename.RData”)
25
25
scripting
qEdit your commands using your favorite text editors
qHow to run
Inside R: >source(filename)
¡ì Takes the input and runs them
¡ì Do syntax-check before anything is executed ¡ì Set echo=T to print executed commands
OR copy & paste
Outside R: R CMD BATCH filename
26
26
13
Principal functions reading data
qread.table, read.csv, for reading tabular data qreadLines, for reading lines of a text file
qsource, for reading in R code files (inverse of dump) qdget, for reading in R code files (inverse of dput) qload, for reading in saved workspaces
qunserialize, for reading single R objects in binary form
read from internet
q ?readLines readLines(“http://statweb.stanford.edu/~rag/stat141/exs/whale.txt”)
q ?read.table read.table(file=”http://statweb.stanford.edu/~rag/stat141/exs/whale.txt”),header=T) #
27
27
Principal functions writing data
qwrite.table, for writing tabular data to text files (i.e. CSV) or connections qwriteLines, for writing character data line-by-line to a file or connection qdump, for dumping a textual representation of multiple R objects qdput, for outputting a textual representation of an R object
qsave, for saving an arbitrary number of R objects in binary format (possibly compressed) to a file.
qserialize, for converting an R object into a binary format for outputting to a connection (or file).
28
14
Using dput() and dump()
qdput()/dget()
¡ì y <- data.frame(a = 1, b = "a") ¡ì dput(y)
¡ì dput(y, file = "y.R")
qdump()/source()
¡ì x <- "foo"; y <- data.frame(a = 1L, b = "a") ¡ì dump(c("x", "y"), file = "data.R")
¡ì rm(x, y)
¡ì source("data.R")
¡ì str(y)
29
Binary Formats save()/load()
qa <- data.frame(x = rnorm(100), y = runif(100)) qb <- c(3, 4.4, 1 / 3)
q## Save 'a' and 'b' to a file
qsave(a, b, file = "mydata.rda")
q## Load 'a' and 'b' into your workspace q load("mydata.rda")
q## Save everything to a file qsave.image(file = "mydata.RData") q## load all objects in this file
q load("mydata.RData")
30
15
Control Structures
31
31
Commonly used control structures
qif and else: testing a condition and acting on it
qfor: execute a loop a fixed number of times
qwhile: execute a loop while a condition is true
qrepeat: execute an infinite loop (must break out of it to stop) qbreak: break the execution of a loop
qnext: skip an iteration of a loop
32
32
16
if-else
qif(
## do something
}
## Continue with rest of code qif(
## do something
}
else {
## do something else }
33
if-else {if-else}
if(
## do something
} else if(
## do something different }
#———————— if(
}
if(
}
34
17
Example
x <- runif(1, 0, 10) q if(x > 3) {
y <- 10 } else { y <- 0
}
q y <- if(x > 3) {
10
} else {
0 }
q y <- ifelse(x>3, 10, 0)
35
ifelse()
q x <- c(6:-4)
qsqrt(x) #-giveswarning qsqrt(ifelse(x>=0,x,NA)) #nowarning
q ## Note: the following also gives the warning ! q ifelse(x >= 0, sqrt(x), NA)
q ## example of different return modes: yes <- 1:3
no <- pi^(0:3)
typeof(ifelse(NA, yes, no)) # logical typeof(ifelse(TRUE, yes, no)) # integer typeof(ifelse(FALSE, yes, no)) # double
36
18
for Loops
qfor(i in 1:10) { print(i)
}
x <- c("a", "b", "c", "d")
qfor(i in 1:4) {
## Print out each element of 'x'
print(x[i]) }
37
for Loops (cont¡¯d)
qseq_along() function is commonly used in conjunction with for loops for(i in seq_along(x)) {
print(x[i]) }
qIt is not necessary to use an index-type variable for(letter in x) {
print(letter) }
qOne line loop (curly braces are not required) for(i in 1:4) print(x[i])
38
19
Nested for loops
x <- matrix(1:6, 2, 3)
for(i in seq_len(nrow(x))) {
for(j in seq_len(ncol(x))) { print(x[i, j])
} }
39
while Loops
while (
}
Example:
count <- 0 while(count < 10) {
print(count)
count <- count + 1 }
While loops can potentially result in infinite loops if not written properly. Use with care!
40
20
repeat
x0 <- 1
tol <- 1e-8 repeat {
Note that the above code will not run if the computeEstimate() function is not defined (I just made it up for the purposes of this demonstration).
x1 <- computeEstimate()
if(abs(x1 - x0) < tol) { ## Close enough?
break } else {
x0 <- x1 }
}
41
next, break
q next is used to skip an iteration of a loop. for(i in 1:100) {
if(i <= 20) {
## Skip the first 20 iterations next
}
## Do something here
}
q break is used to exit a loop immediately, regardless of what iteration the loop may be on. for(i in 1:100) {
print(i) if(i > 20) {
## Stop loop after 20 iterations
break
} }
42
21
Summary
qControl structures like if, while, and for allow you to control the flow of an R program
qInfinite loops should generally be avoided, even if (you believe) they are theoretically correct.
qControl structures mentioned here are primarily useful for writing programs; for commandline interactive work, the ¡°apply¡± functions are more useful.
¨ It is more efficient to use built-in functions rather than control structures whenever possible.
43
Functions
44
44
22
Functions in R
qA core activity of an R programmer.
q¡°user¡± ¨developer
qWhen to write a function
¡ì Encapsulate a sequence of expressions that need to be executed numerous times, perhaps
under slightly different conditions.
¡ì Code must be shared with others or the public
qCreate an interface to the code: via a set of parameters.
qThis interface provides an abstraction of the code to potential users. ¡ì Ex: sort()
45
Your First Function
f <- function() {
## This is an empty function
}
## Functions have their own class
class(f)
# Execute this function
f()
#with a parameter f <- function(num) {
for(i in seq_len(num)) { cat("Hello, world!\n")
} }
f(3)
# with return value f <- function(num) {
hello <- "Hello, world!\n" for(i in seq_len(num)) {
cat(hello)
}
chars <- nchar(hello) * num chars
}
meaningoflife <- f(3)
#more fun
#return the very last expression that is evaluated.
46
23
Default value
f()
f <- function(num = 1) {
hello <- "Hello, world!\n" for(i in seq_len(num)) {
cat(hello)
}
chars <- nchar(hello) * num chars
}
f() ## Use default value for 'num¡®
So far, we have written a function that
qhas one formal argument named num with a
default value of 1.
qprints the message ¡°Hello, world!¡± to the console a
number of times indicated by the argument num
qreturns the number of characters printed to the console
f(2)
f(num=2) #specified using argument name
47
Argument Matching
qR functions arguments can be matched positionally or by name.
qPositional matching just means that R assigns the first value to the first argument,
the second value to second argument, etc.
> str(rnorm)
function (n, mean = 0, sd = 1)
> set.seed(0)
> mydata <- rnorm(100, 2, 1) ## Generate some data
100 is assigned to the n argument, 2 is assigned to the mean argument, and 1 is assigned to the sd argument, all by positional matching.
48
24
Specifying arguments by name
qOrder doesn¡¯t matter then
> sd(na.rm = FALSE, mydata)
Here, the mydata object is assigned to the x argument, because it¡¯s the only argument not yet specified.
qFunction arguments can also be partially matched
qThe order of operations when given an argument
1. Check for exact match for a named argument
2. Check for a partial match
3. Check for a positional match
49
Example
> args(lm)
function (formula, data, subset, weights, na.action, method = “qr”, model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, …)
NULL
The following two calls are equivalent. set.seed(0)
mydata = data.frame(y=rnorm(20), x=rnorm(20)) lm(data = mydata, y ~ x, model = FALSE, 1:20) lm(y ~ x, mydata, 1:20, model = FALSE)
50
25
Example- Cont.
set.seed(0)
mydata = data.frame(y=rnorm(20), x=rnorm(20)) lm(data = mydata, y ~ x, model = FALSE, 1:20) lm(y ~ x, mydata, 1:20, model = FALSE)
#check the difference
mydata1=mydata[1:10,]
lm(y ~ x, mydata, model = FALSE)
lm(y ~ x, mydata, 1:20, model = FALSE)
lm(y ~ x, mydata, 1:10, model = FALSE) plot(mydata1)
abline(lm(y ~ x, mydata1, model = FALSE)) plot(mydata)
abline(lm(y ~ x, mydata, model = FALSE))
51
51
Lazy Evaluation
qArguments to functions are evaluated lazily, so they are evaluated only as needed in the body of the function.
> f <- function(a, b) { a^2
}
> f(2)
> f <- function(a, b) { print(a)
print(b) }
> f(45)
52
26
The … Argument
qA special argument in R
qIndicate a variable number of arguments that are usually passed on to other
functions.
}
qThe … argument is necessary when the number of arguments passed to the function cannot be known in advance.
> args(paste)
function (…, sep = ” “, collapse = NULL)
NULL
> args(cat)
function (…, file = “”, sep = ” “, fill = FALSE, labels = NULL, append = FALSE) NULL
myplot <- function(x, y, type = "l", ...) {
plot(x, y, type = type, ...) ## Pass '...' to 'plot' function
53
Arguments Coming After the ... Argument
qOne catch with ... is that any arguments that appear after ... on the argument list must be named explicitly and cannot be partially matched or matched positionally.
args(paste)
function (..., sep = " ", collapse = NULL) NULL
paste("a", "b", sep = ":")
paste("a", "b", se = ":")
54
27
Summary
qFunctions can be defined using the function() directive and are assigned to R objects just like any other R object
qFunctions have can be defined with named arguments; these function arguments can have default values
qFunctions arguments can be specified by name or by position in the argument list
qFunctions always return the last expression evaluated in the function body
qA variable number of arguments can be specified using the special ... argument in a function definition.
55
Resources
qhttps://www.statmethods.net/management/userfunctions.html qhttps://bookdown.org/rdpeng/rprogdatascience/
qhttps://www.tutorialspoint.com/r/index.htm
qhttps://stat.ethz.ch/R-manual/R- devel/library/datasets/html/00Index.html
56
56
28