ST227: R in Life Insurance
17/03/2022
Kaplan – Meier estimation
Copyright By PowCoder代写 加微信 powcoder
▶ Let us re-create the following example from the lecture slide.
survData <- data.frame(
time = c(10,13,18,19,23,30,36,38,54,56,59,75,93,97,104,107,107,107),
observed = c(T,F,F,T,F,T,T,F,F,F,T,T,T,T,F,T,F,F)
head(survData)
## time observed
##1 10 TRUE
##2 13 FALSE
##3 18 FALSE
##4 19 TRUE
##5 23 FALSE
##6 30 TRUE
▶ If you had data stored in an external excel spreadsheet, consider using readxl::read_excel to import it.
survData <- read_excel(file.choose())
▶ Above: head displays the first few rows of the data frame - this avoids cluttering the display.
Kaplan - Meier estimation
▶ Assumptions: all uncensored death times are distinct. ▶ Steps:
1. Calculate the number of individuals at risk at each time.
2. Filter out i.e. remove the right censored individuals.
3. Calculate step-wise survival probability and estimate.
▶ Step 1: the number of individuals at risk is all remaining observed units (inclusive). This means the remaining number of rows.
## time observed atRisk
##1 10 TRUE 18
##2 13 FALSE 17
##3 18 FALSE 16
##4 19 TRUE 15
##5 23 FALSE 14
##6 30 TRUE 13
survData$atRisk <- nrow(survData):1
head(survData)
Subsetting of a vector
▶ For step 2: we will need subsetting techniques.
▶ Subsetting means selecting a portion (i.e. a subset) of the data. It is critical to
data analysis.
▶ Each language has slightly different subsetting syntax. R being built for data
analysis has very mature subsetting mechanics.
▶ Example:
x[c(1,2,4)]
## [1] 1 5 7
▶ In the second line of code, we selected the first component. In the third line, the first, second and fourth component.
x <- c(1,5,3,7,8,2,4)
Subsetting a vector
▶ In the previous example, we directly specified the coordinates of the desired subset. This is called numerical subsetting.
▶ Another important technique is logical and Boolean subsetting.
▶ We need a logical vector of the same length as the parent vector. It will select all
elements corresponding to a TRUE.
x <- c(1,5,3,7,8,2,4) #same x as above
selection <- c(T,F,T,F,T,F,T) #select every other elements x[selection]
## [1] 1 3 8 4
▶ You can chain operations together for very succinct and self-explanatory codes, e.g: x[x>3] #read: x such that x > 3
## [1] 5 7 8 4
Subsetting of a data frame
▶ A data frame is a two dimensional structure. You might want to subset by either rows or columns.
▶ We use R’s built-in data set mtcars. head(mtcars)
## Mazda RX4
## Mazda RX4 Wag
## Datsun 710
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
▶ Below are a few examples:
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mtcars[5,2] #fifth row, second column mtcars[c(1,2,3),c(1,2)] #rows 1 to 3, columns 1 to 2 mtcars[c(1,2), ] #rows 1,2 and all columns
mtcars[ ,c(1,2)] #all rows and columns 1 to 2
Kaplan – Meier estimation
▶ Back to the main problem. We will need to:
1. negate the observed vector. This achieves an indicator of fully observed records.
2. use this Boolean vector to subset all the rows corresponding to fully observed records.
survData2 <- survData[survData$observed, ]
head(survData2)
## 1 10 ## 4 19 ## 6 30 ## 7 36 ##11 59 ##12 75
time observed atRisk
TRUE 18
TRUE 15
TRUE 13
TRUE 12
TRUE 8
TRUE 7
Kaplan - Meier estimation
▶ We can now fill in the details:
survData2$death <- 1 #or rep(1,time=nrow(survData2))
survData2$survProb <- (survData2$atRisk - survData2$death)/survData2$atRi survData2$KP <- cumprod(survData2$survProb)
#a more succint syntax:
# with(survData2,{ (atRisk-Death)/atRisk })
## 16 107
time observed atRisk death survProb KP
TRUE 18
TRUE 15
TRUE 13
TRUE 12
TRUE 8
TRUE 7
TRUE 6
TRUE 5
TRUE 3
1 0.9444444 0.9444444
1 0.9333333 0.8814815
1 0.9230769 0.8136752
1 0.9166667 0.7458689
1 0.8750000 0.6526353
1 0.8571429 0.5594017
1 0.8333333 0.4661681
1 0.8000000 0.3729345
1 0.6666667 0.2486230
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com