Question 1.
ST227 Solution
27/04/2021
First, let us define the mortality function. One naive definition maybe:
Copyright By PowCoder代写 加微信 powcoder
However, we will need mu to be a vectorised function for later steps. That is, the length of the output needs to match that of the input. The simplest way to do it is by replicating lambda for a number of n times, where n is the length of the input vector t.
mu <- function(t){rep(lambda,times=length(t))}
Now, we can start defining the survival probability. For this part, it is not critical that tpx is vectorised, since
we are not feeding it into the integrate routine.
lambda = 0.1
mu <- function(t){lambda}
tpx <- function(t,x){ exp(-integrate(mu,lower=x,upper=x+t)$value)
#You can also implement the vectorised version here if you wish
tpx <- function(t,x){ sapply(
FUN = function(t){
exp(-integrate(mu,lower=x,upper=x+t)$value)
For a 15-year-old mechanical system, the chance of it surviving the next 5 years is:
tpx(t=5,x=15)
## [1] 0.6065307
We consider an updated mortality function with an iterated logarithm term: μ ̃=λ+γloglog((e+t)), λ=0.1,γ=1.5.
Please note that I accidentally uploaded a version on moodle with log(log(t)). This was an oversight from me, as log log(t) can be undefined for small t close to 0. The method is still the same, though.
In order to define the density, we need to first define the survival function.
lambda <- 0.1; gamma <- 1.5
mu <- function(t){lambda + gamma*log(log(exp(1) + t))} tpx <- function(t,x){
sapply( t,
FUN = function(t){ exp(-integrate(mu,lower=x,upper=x+t)$value)
density <- function(t){
tpx(t=t,x=15)*mu(t+15)
For part (b), the expected remaining lifetime is:
## 0.5881207 with absolute error < 2.9e-06
The cumulative distribution function tqx is simply 1 − survival function:
Part (c) asks you to discuss how you would find the 95th percentile - but not to actually solve it. You should give some direction and not simply state some off-the-shelf solution (e.g. external packages or libraries). An approprimate method is interval bisection. Note that in this case, the cdf curve will cut through the target level 0.95, so we do not have to worry about the interval bisection method missing out the tangential solution.
Question 2.
For such a small data set, we can simply type it in:
lifetimes<-c(64,75,29,45,67,65,77,90,65,55,80,67,72,46,64,28,68,75,49,94) Part 1.
integrand <- function(t){t*density(t)} integrate(integrand,lower=0,upper=100)
tqx <- function(t,x){ 1-tpx(t=t,x=x)
Its first two moments are given by:
E(X)=α, Var(X)=α. β β2
Therefore, if we denote m1 =
E(X) = α/β =β; α=βE(X)= E(X)2 Var(X) α/β2 Var(X)
Xi Xi2
i and m2 = i , then the method of moment estimators are:
αˆ= m21 , βˆ= m1 . m 2 − m 21 m 2 − m 21
We are now ready to define the the negative log likelihood and start the optim routine.
nLL <- function(params, x){
alpha <- params[1]; beta <- params[2] - sum(
dgamma(x,shape=alpha,rate=beta,log=TRUE)
alphaMM <- mean(lifetimes)ˆ2/(mean(lifetimesˆ2)-mean(lifetimes)ˆ2)
betaMM <- mean(lifetimes)/(mean(lifetimesˆ2)-mean(lifetimes)ˆ2)
optim(par=c(alphaMM,betaMM),nLL, x = lifetimes)
## Warning in dgamma(x, shape = alpha, rate = beta, log = TRUE): NaNs produced
## Warning in dgamma(x, shape = alpha, rate = beta, log = TRUE): NaNs produced
## Warning in dgamma(x, shape = alpha, rate = beta, log = TRUE): NaNs produced
## [1] 11.407096 0.178919
## [1] 86.53808
## $counts
## function gradient
## 59 NA
## $convergence
## $message
The survival function and thus density are derived as follows:
The joint likelihood is:
tp0 = exp − αλαsα−1dt (1)
α αs=t
=exp −λs (2) s=0
= exp − λαtα. (3) f(t) = μ(t) × tp0 (4) = αλαtα−1 exp − λαtα. (5)
L(λ, α|t) = αλαtα−1 exp − λαtα (6) ii
= αnλnα tα−1 exp − λαtα. (7)
nn l(λ,α|t)=nlog(α)+nαlog(λ)+(α−1)log(ti)−λαtαi . (8)
The question now is how to choose a good initial value for the optimisation algorithm. We can observe when α = 1, this reduces to an exponential model. We can use this sub-model as a starting point:
αˆini = 1, λˆini = 1 ̄. t
## Warning in log(lambda): NaNs produced
## Warning in log(lambda): NaNs produced
## Warning in log(lambda): NaNs produced
## [1] 4.39782558 0.01427681
## [1] 84.77866
## $counts
## function gradient
## 79 NA
## $convergence
## $message
Question 3.
Again, the first task is to import the data:
The calculation follows pretty much verbatim to the workshop slides.
## [1] 0.9600000 0.9200000 0.8800000 0.8400000 0.8000000 0.7529412 0.7027451
## [8] 0.6525490 0.5932264 0.5339037 0.4671658 0.4004278 0.3336898 0.2669519
kExamData_canc
nLL <- function(param, x){ alpha <- param[1]
lambda <- param[2]
n <- length(x)
- n*log(alpha) - n*alpha*log(lambda) -
(alpha - 1)*sum(log(x)) +
sum(lambdaˆalpha*xˆalpha)
optim(par = c(1,1/mean(lifetimes)), fn=nLL, x = lifetimes)
cancer <- readxl::read_excel("C:/Users/ /Dropbox/LSE 2021 ST227/Mock/moc
cancer <- as.data.frame(cancer)
cancer$atRisk <- nrow(cancer):1
#filter the fully observed rows only
cancerObs <- cancer[cancer$fullyObserved, ]
cancerObs$death <- 1
cancerObs$survProb <- (cancerObs$atRisk - cancerObs$death)/cancerObs$atRisk cancerObs$KM <- cumprod(cancerObs$survProb)
cancerObs$KM
## [15] 0.2002139 0.1334759 0.0000000
And here’s the :
## [1] 0.001536000 0.002944000 0.004224000 0.005376000 0.006400000 0.007753470
## [7] 0.009105804 0.010191105 0.011621651 0.012580795 0.013529383 0.013757632
## [13] 0.013265541 0.012053111 0.010120342 0.007467234 NaN
Question 4.
Note that for this question we need the fully data set, not just the fully observed subset of it.
library(survival)
## Attaching package: 'survival'
## The following object is masked _by_ '.GlobalEnv':
## cancer
## coxph(formula = survivalObject ~ sex, data = cancer)
## n= 25, number of events= 17
## coef exp(coef) se(coef) z Pr(>|z|)
## sex 0.1526 1.1649 0.5251 0.291 0.771
## exp(coef) exp(-coef) lower .95 upper .95
cancerObs$greenwoodVar <-
cancerObs$KMˆ2*cumsum(
cancerObs$death/(cancerObs$atRisk*(cancerObs$atRisk-cancerObs$death))
cancerObs$greenwoodVar
survivalObject <- Surv(cancer$time,cancer$event)
coxmodel <- coxph(survivalObject ~ sex, data = cancer)
summary(coxmodel)
## sex 1.165 0.8584 0.4162
## Concordance= 0.48 (se = 0.077 )
## Likelihood ratio test= 0.09 on 1 df,
## Wald test = 0.08 on 1 df,
## Score (logrank) test = 0.08 on 1 df,
The Cox Proportional Hazard model has only one parameter, β, which has a point estimate value of 0.1526. With the p-value being approximately 0.80 at all tests, we do not reject the null hypothesis at any reasonable significance level (be careful, one never accepts the null hypothesis).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com