程序代写代做代考 data mining graph R and RStudio

R and RStudio
Introduction to R
Mariia Okuneva
• R is the name of the programming language designed for statistical computing and graphics, RStudio is a convenient interface.
• We write all the commands in this script window and send them to the console. To clear console use Ctrl + L.
Advantages of R
• Free open software that works on any operating system.
• R has over 2000 contributed packages that increase its functionality. The source code of every R component is usually freely available.
• R scripts allow to easily share your work with others.
• Large and growing community of users.
Inroduction to R:
1. Books: R by Example and A Beginner’s Guide to R of the Use R! Series by Springer 2. ISLR 2.3 Lab: Introduction to R
3. Google!
Basic commands
To change the working directory, use setwd(“your/own/path”). setwd(“X:/Doctor/data mining/SS20”) # your path here!
To display at which directory R is currently pointed, use getwd(). This function has no arguments. getwd()
## [1] “X:/Doctor/data mining/SS20”
Markdown
To produce well-formatted comments with Markdown, use: bold text
italics
code blocks
1

Heading 1
Heading 2
Heading 3
• item 1
• item 2
Read more about RMarkdown here.
Clean up!
Remove all objects defined in the current session before running new code.
rm(list = ls()) Help!
There are two ways to bring up help documentation for a particular command or function help(function) or ? function.
? mean
## starting httpd help server … done
## [1] 0 1 2 3 4 5 6 7 8 9 10 50
## [1] 8.75
If documentation is not enough, try:
1. google
2. StackOverflow about R 3. Quick R
Packages.
A package is a group of functions and data sets that are not included in the base R distribution.
## Warning: package ‘ISLR’ was built under R version 3.6.3
If you receive an error message when loading a library, it likely means that the library has not yet been installed.‘install.packages()’ can be used to download and install a package automatically through an available internet connection.
# install.packages(“ISLR”)
help(mean)
x <- c(0:10, 50) x xm <- mean(x) xm library(MASS) library(ISLR) 2 Numbers, vectors, matrices. Numbers You can use R as a powerful pocket calculator: 1+2 ## [1] 3 1-2 ## [1] -1 1*2 ## [1] 2 1/2 ## [1] 0.5 How would you raise 3 to the power of 5? 3^5 ## [1] 243 5 modulo 3? 5 %% 3 ## [1] 2 Basic functions and comparison relations for numbers: sin(pi) ## [1] 1.224606e-16 cos(pi) ## [1] -1 exp(1) ## [1] 2.718282 log(1) ## [1] 0 5 < 5 # smaller ## [1] FALSE 2 != 1 # unequal ## [1] TRUE 5 == 3+2 # logical equal ## [1] TRUE floor and ceiling might be useful to transform numbers: pi 3 ## [1] 3.141593 floor(pi) # returns the largest integer that is smaller than the value of a given variable ## [1] 3 ceiling(pi) # returns the smallest integer that is larger than the value of a given variable ## [1] 4 Variables Use <- or = to assign an object (e.g., number or a function description) to a variable. ## [1] 10 ## [1] 3.141593 R is case sensitive! ## [1] 3 ## [1] 5 Types of variables. class(my_var) ## [1] "numeric" class(3<2) ## [1] "logical" class("Hello, World!") ## [1] "character" Vectors Create a vector with the function c(). ## [1] 1 2 3 character_v my_var <- 10 my_var my_var <- pi my_var x<-3 x X<-5 X numerical_v = c(1, 2, 3) character_v = c("data", "mining") logical_v = c(T, T, F) numerical_v 4 ## [1] "data" "mining" logical_v ## [1] TRUE TRUE FALSE class(numerical_v) ## [1] "numeric" class(character_v) ## [1] "character" class(logical_v) ## [1] "logical" ## [1] "mix" "2" class(mixed) ## [1] "character" The _i_th element of the vector can be addressed using vector_name[i]. numerical_v[3] ## [1] 3 character_v[1] ## [1] "data" logical_v[2] ## [1] TRUE Multiple selection of elements of a vector may be done using another vector of indices as arguments in the square brackets. numerical_v[c(1,2)] # Select the first and the second element. ## [1] 1 2 numerical_v[1:2] # Another syntax, 1:2 is a shortcut for c(1,2) ## [1] 1 2 We can apply the same transformation to all elements of a vector.For example, we can calculate the elementwise inverse with the command ˆ(-1). numerical_v^(-1) ## [1] 1.0000000 0.5000000 0.3333333 numerical_v # contains old values ## [1] 1 2 3 mixed = c("mix", 2) mixed numerical_v = numerical_v^(-1) numerical_v # contains new values 5 ## [1] 1.0000000 0.5000000 0.3333333 Other ways to create a vector: 1. seq(x, y): vector of integers between x and y. 2. seq(x, y, by = z): sequence of numbers from x to y in steps of z. 3. seq(x,y, length=z): makes a sequence of z numers that are equally spaced between x and y. 4. rep(): create a vector in which some values are repeated. seq(1, 3) ## [1] 1 2 3 seq(1, 3, by = 2) ## [1] 1 3 seq(1, 4, length = 3) ## [1] 1.0 2.5 4.0 rep(numerical_v, 2) ## [1] 1.0000000 0.5000000 0.3333333 1.0000000 0.5000000 0.3333333 Some operations with vectors: c(1, 2, 3) + c(4, 5, 6) # element-wise sum ## [1] 5 7 9 c(1, 2, 3) * c(4, 5, 6) # element-wise product ##[1] 41018 c(4, 5, 6) > 5
## [1] FALSE FALSE TRUE
numerical_v
## [1] 1.0000000 0.5000000 0.3333333
numerical_v[-1] # exclude first ## [1] 0.5000000 0.3333333
numerical_v[-c(1,3)] # exclude the first and the third ## [1] 0.5
which(numerical_v == 1) # indices of elements that fulfill the condition ## [1] 1
min(numerical_v) ## [1] 0.3333333
max(numerical_v) ## [1] 1
range(numerical_v) # min and max value ## [1] 0.3333333 1.0000000
6

sort(numerical_v) # values in increasing order ## [1] 0.3333333 0.5000000 1.0000000
sort(numerical_v, decreasing = TRUE) # values in decreasing order ## [1] 1.0000000 0.5000000 0.3333333
## [1] 1 2 3
b
## [1] 2 3 4 5
a %in% b # gives the elements of a that are also in b ## [1] FALSE TRUE TRUE
Matrices
Creating a matrix
There are several ways to create a matrix in R:
matrix(1:9, byrow = TRUE, nrow = 3)
## [,1] [,2] [,3] ##[1,] 1 2 3 ##[2,] 4 5 6 ##[3,] 7 8 9
matrix(0, 2, 5) # zeros, 2×5
## [,1] [,2] [,3] [,4] [,5] ##[1,] 0 0 0 0 0 ##[2,] 0 0 0 0 0
## [1] 1 2 3 4 5 6
y
##[1] 7 8 9101112
## [,1] [,2] [,3] [,4] [,5] [,6] ##x 1 2 3 4 5 6 ##y 7 8 9 10 11 12
matrix_c
## xy
a = 1:3 b = 2:5 a
x = 1:6 y = 7:12 x
matrix = rbind(x,y) # bind vectors row-wise matrix_c = cbind(x,y) # bind vectors column-wise
matrix
7

## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
Selection of elements and submatrices:
matrix[2,5] # select element in row 2, column 5 ## y
## 11
matrix[1:2, 3:4] # several rows and columns
## [,1] [,2] ##x 3 4 ##y 9 10
matrix[2,] # second row ##[1] 7 8 9101112
matrix[,2] # second column ## x y
## 2 8
t(matrix) # transpose matrix
## xy ## [1,] 1 7 ## [2,] 2 8 ## [3,] 3 9 ## [4,] 4 10 ## [5,] 5 11 ## [6,] 6 12
row(matrix) == col(matrix) # condition (row index = column index)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] TRUE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE TRUE FALSE FALSE FALSE FALSE
matrix[row(matrix) == col(matrix)] # select elements [1,1]; [2,2]; etc. ## [1] 1 8
Some operations on matrices.
rowSums(matrix) # calculates the totals for each row of a matrix ## x y
## 21 57
2 * matrix # multiply each element of a matrix by two.
## [,1] [,2] [,3] [,4] [,5] [,6] ##x 2 4 6 8 10 12 ##y 14 16 18 20 22 24
8

a = matrix(1:4, byrow = TRUE, nrow = 2) b = matrix(5:8, byrow = TRUE, nrow = 2) a
##
## [1,]
## [2,]
b
##
## [1,] 5 6
## [2,] 7 8
a*b # creates a matrix where each element is the product of the corresponding elements in a and b
[,1] [,2]
1 2
3 4
[,1] [,2]
##
##[1,]
## [2,] 21 32
[,1] [,2] 5 12
a %*% b # standard matrix multiplication
##
## [1,] 19 22
## [2,] 43 50
Data Frames.
A data frame is a very useful object, because of the possibility of collecting data of different types (numeric, logical, factor, character, etc.).
Create DataFrame with ‘data.frame()’.
## [1] TRUE
data is a package that loads specified data sets.
## Murder Assault UrbanPop Rape
[,1] [,2]
cities <- c("Berlin", "New York", "Paris", "Tokyo") # character population <- c(3.4, 8.1, 2.1, 12.9) # numeric myframe = data.frame(cities, population) is.data.frame(myframe) # check if object is a dataframe help(data) # show documentation for package data data() # list all available data sets data("USArrests") # load the data set help(USArrests) head(USArrests) # enables you to show the first observations of a data frame ## Alabama 13.2 236 ## Alaska 10.0 263 ## Arizona 8.1 294 ## Arkansas 8.8 190 ## California 9.0 276 ## Colorado 7.9 204 58 21.2 48 44.5 80 31.0 50 19.5 91 40.6 78 38.7 9 tail(USArrests) # print out the last observations in the data set ## ## Vermont ## Virginia ## Washington ## West Virginia ## Wisconsin ## Wyoming Subsetting Murder Assault UrbanPop Rape 2.2 48 8.5 156 4.0 145 5.7 81 2.6 53 6.8 161 32 11.2 63 20.7 73 26.2 39 9.3 66 10.8 60 15.6 subset(USArrests, Murder > 15)
## Murder Assault UrbanPop Rape
## Florida 15.4 335
## Georgia 17.4 211
## Louisiana 15.4 249
## Mississippi 16.1 259
80 31.9
60 25.8
66 22.2
44 17.1
USArrests[USArrests$Murder>15, ] # another way to subset
##
## Florida
## Georgia
## Louisiana
## Mississippi
##
## Florida 15.4 335
## Georgia 17.4 211
## Louisiana 15.4 249
## Mississippi 16.1 259
dim(USArrests) # dimensions of the data frame ## [1] 50 4
names(USArrests)
## [1] “Murder” “Assault” “UrbanPop” “Rape”
Loops and conditions If statement
Murder Assault UrbanPop Rape
15.4 335
17.4 211
15.4 249
16.1 259
80 31.9
60 25.8
66 22.2
44 17.1
attach(USArrests) # attach USArrests[Murder>15, ]
Murder Assault UrbanPop Rape
80 31.9
60 25.8
66 22.2
44 17.1
if(condition) { expression
}
x=1
if (x==2){print(“x == 2”)}
if (x>0){print(“x is a positive number”)}
10

## [1] “x is a positive number”
if (x==2){print(“x == 2”)} else {print(“x != 2”)} ## [1] “x != 2”
## [1] “x is a negative number”
if(condition) { expression1
} else { expression2
}
x = -1
if (x>0){print(“x is a positive number”)} else { print(“x is a negative number”)}
if(condition) { expression1
} else if (condition2) { expression2
} else { expression3
}
x=0
if (x>0) {
print(“x is a positive number”) } else if (x == 0){
print(“x is zero”) } else {
print(“x is a negative number”) }
## [1] “x is zero”
Loops
While loop
While loop carries out the computation until a certain condition is fulfilled.
while(condition){ expression
}
# as long as i<21, set i-th element equal to i and increase i by 1 x = c() i=1 while (i<21){ x[i] = i i=i+1 } x ## [1] 1 2 3 4 5 6 7 8 91011121314151617181920 break statement 11 # as long as i<21, set i-th element equal to i and increase i by 1 # but if i == 15, stop while loop x = c() i=1 while (i<21){ x[i] = i i=i+1 if (i == 15) break } x ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 for loop for(variable in sequence){ expression } # for i from 1 to 20, the i-th element of x takes value i x = c() for (i in 1:20){ x[i]=i } x ## [1] 1 2 3 4 5 6 7 8 91011121314151617181920 ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Writing R functions Let’s write a function to fit a regression model and make a plot: # for i from 1 to 20, the i-th element of x takes value i # but if i == 15, stop for loop x = c() for (i in 1:20){ if (i == 15) break x[i]=i } x function_name = function(arg1, arg2){ body return(output) } # curly brace after function indicates that we start giving the code in our function # our function is going to take arguments x and y. regplot=function(x,y){ fit = lm(y~x) plot(x,y) # plot the response against the feature abline(fit, col='red') # add the regression line } 12 attach(USArrests) ## The following objects are masked from USArrests (pos = 3): ## ## Assault, Murder, Rape, UrbanPop regplot(UrbanPop, Murder) 30 40 50 60 70 80 90 x Let’s make this function a little more useful: I’m defining this function with the same commands, but I’ve added . . . in the argument. . . . means unnamed arguments. So, you are allowed to add extra arguments, which will be used inside of the function (choose where exactly and put . . . ). regplot(UrbanPop, Murder, xlab = 'UrbanPop', ylab = 'Murder', col = 'blue', pch = 20) regplot=function(x,y, ...){ fit = lm(y~x) plot(x,y, ...) # plot the response against the feature abline(fit, col='red') # add the regression line } 13 y 5 10 15 30 40 50 60 70 80 90 UrbanPop Task: Define your own function for square root of a number.Try to do it yourself before looking at the answer! rootsquare(9) ## [1] 3 rootsquare = function(x){ root = sqrt(x) return(root) } 14 Murder 5 10 15