代写代考 ST227 Survival Models – Part IV Estimating the Lifetime Distribution

ST227 Survival Models – Part IV Estimating the Lifetime Distribution
Department of Statistics, London School of Economics
March 3, 2019

Copyright By PowCoder代写 加微信 powcoder

1 The Kaplan- and the Cox Re- gression Model
1.1 Cohort Studies
 We want to estimate the distribution of the lifetime variables T, T0 and Tx. We consider Örst data that arises from a cohort study. Later we will look at the cross-sectional study.
 We observe a large number of new-born lives. The proportion alive at age t is an estimate of the survivor function S(t). This will be a step function, and the larger the number of lives the smoother the estimate of S(t) will be.

 Equivalently, we can take the proportion who have died up to and including time t. This estimate of F(t) is the empirical distribution function, F^(t). A typical graph of F^(t)(for a small sample) would look like this.
Figure 1. Empirical distribution function, F^(t)

 This graph could be smoothed or graduated, if required.
 Problems: the cohort study su§ers from a number of problems which
make it unsuitable in insurance.
ñ To observe the mortality of a cohort over all ages requires around 100
years of data.
ñ In practice, we will lose track of a large number of individuals in the cohort. This may produce a biased sample, since the mortality of the lost lives may be di§erent from those lives that remain in the study. We call this problem censoring. For censored lives, all we know is that these lives had lifetimes that exceed a certain age.

1.2 Censoring
 As was mentioned in Part III, censoring is a key feature of survival data and has a profound impact on the estimation procedure.
 In particular, it is a form of missing data problem which arises when we do not observe the exact length of a lifetime, but observe only that its length falls within some interval.
 This can happen in several ways.

1. Right-censoring occurs if the censoring mechanism cuts short observa- tions in progress.
Example: An example of this is the ending of a mortality investigation before all the lives being observed have died. Persons still alive when the investigation ends are right censored. We know only that their lifetimes exceed the censoring time t, i.e. T  t.
2. Left-censoring occurs if we do not know the time of entry into the state of interest.

Example: Left-censoring often in medical studies in which patients are subject to regular examinations. Discovery of a condition tells us only that the onset fell in the period since the previous examination; the time elapsed since onset has been left-censored. This time we can say that the survival time from t1, the unknown time of onset, exceeds t3 t2, i.e. T  t3 t2.
3. Interval-censoring occurs when the event of interest occurs somewhere between two times t1 and t2. An example arises in mortality investigations, where we might know only the calendar year of death. In the diagram we

know the life is alive at time t1 and dead at time t2, but we do not know the exact time of death t. All we can say is that t1  T  t2.
Note: Both right- and left-censoring can be seen as special cases of interval censoring.
Further terminology: There are a number of ways in which any of the above forms of censoring can arise.

1. Random censoring occurs if the censoring time Ci for life i is a random variable. The observation is then censored if Ci < Ti, where Ti is the random lifetime of life i. Example: In medical studies a patient moves away from the study area at a random time, and the study loses track of the patient. 2. Non-informative censoring occurs if the censoring gives no information about the lifetimes fTig. Thus, if each member of the pair fTi;Cig is independent then censoring is non-informative. Example: In medical studies a patient moves away from the study area at a time unconnected with the progress of their illness. This is non- informative. However, if the patient moves away from the study area in order to receive a di§erent treatment this is informative. Informative censoring may introduce bias into the observed survival times and must be treated with extra care. 3. Type I censoring occurs when the censoring times are known in advance, i.e. censoring occurs at Öxed times. Example: A mortality investigation may end on a particular date. 4. Type II censoring occurs if observation continues until a Öxed known num- ber of deaths occurs. In this case the number of events is non-random. Example: Type II censoring occurs in reliability testing when components may be tested until a certain number fail. This kind of censoring is un- common in mortality studies. 1.3 Cross-sectional Studies Instead of following a cohort of lives for a long time we can use a cross- sectional study. We divide the investigation into single years of age, say, x ! x + 1, and then follow each mini "cohort" for three or four years. There are a number of points to note:  We are now mixing mortality since the mortality of a 60 year old now is not the same as the mortality of a 57 year old in three years time.  The method introduces type I censoring in the most obvious way, since most lives will not die (we suppose).  In the following section we develop the empirical distribution function to allow for censoring.  We will consider lifetimes as a function of time t without mention of a starting age x.  The following could be applied equally to newborn lives, to lives aged x at outset, or to lives with some property in common at time t = 0, for example diagnosis of a medical condition.  Medical studies are often based on time since diagnosis, or time since the start of treatment, and if the patientís age enters the analysis it is usually as an explanatory variable in a regression model. 1.4 The Kaplan-Meier (K-M) estimate of the survivor function Suppose we have n lives with survival times t1;:::;tn where some of the observations are right-censored (non-informative). Suppose there are r distinct death times, r  n, which we arrange in ascending order t(1) < t(2) < ::: < t(r). We form the set of intervals Ij= [t(j); t(j+1)); j = 0; 1; :::; r where t(0) = 0 and t(r+1) = 1. Let nj=number of individuals alive at t(j) i.e., immediately before time t(j) (the jth death-time), and let dj=number of deaths at t(j)  Consider the interval (t(j); t(j)]. There are nj patients alive at t(j), i.e. the beginning of the interval, and dj deaths at t(j):  Thus, the probability that a patient dies in (t(j) ; t(j)] is estimated by dj=nj .  So the maximum likelihood estimate of the probability of surviving from t(j)  to t(j) is 1dj=nj = (nj dj)=nj.  Further, since there are no deaths in (t(j); t(j+1) ], the probability of surviving from t(j)  to t(j+1)  is also (nj dj)=nj. If we let  ! 0 then we can estimate the probability of surviving the interval t(j) to t(j+1) as (nj dj)=nj. We now assume that all deaths are independent of each other. Let time t lie in Ik. Then we estimate the probability of survival to time t as the probability of surviving through t(k) to t(k+1), and all preceding intervals. This gives the maximum likelihood estimate of the survivor function, usually known as the Kaplan-Meier estimate, S(t) = ( n ); t 2 [t(k); t(k+1)): (1) j=1 j with Sb(0) = 1. Note the following  If the largest observation corresponds to a death then nr = dr so Sb(t) = 0 for t  tr.  If the largest observation, say t, is censored then Sb(t) is undeÖned for t  t. Example: The data refer to an experiment to study the use of an IUD device. The time origin corresponds to the Örst day on which the woman uses the IUD. The observation is the time to discontinuation of use of the IUD because of bleeding problems. Notice that some observations are censored, denoted  . In this example, censoring could occur for a number of reasons: (a) desire for pregnancy, (b) no further need of contraception, (c) lost to follow-up, (d) the study ended. Table 1. Time in weeks to discontinuation of the use of an IUD 10 13 18 19 23 30 36 38 54 56 59 75 93 97 104 107 107 107 We set out the calculation of the Kaplan-Meier estimate as follows. Table 2. Kaplan-Meier estimate of the survivor function Interval nj dj (nj dj)=nj Sb(t) 0- 18 0 10- 18 1 19- 15 1 30- 13 1 36- 12 1 59- 8 1 75- 7 1 93- 6 1 97- 5 1 1.0000 1.0000 0.9444 0.9444 0.9333 0.8815 0.9231 0.8137 0.9167 0.7459 0.8750 0.6526 0.8571 0.5594 0.8333 0.4662 0.8000 0.3729 0.6667 0.2486 The table is constructed in the following steps: 1. Order the data. If there are any censored observations with the same times as a death time place them after the deaths dj. Here dj is the number of individuals experiencing the event at duration distinct death times, t(j); j = 1; :::; r. 2. Column one consists of the distinct death times, t(j); j = 1; :::; r. 3. Column two consists of the number of observations nj greater than or equal to the death time t(j) (including the deaths at time t(j)). 4. Column three consists of the number of deaths at time t(j); if there are no multiple deaths the entries will all be 1 (except the Örst which is 0). 5. Columns four and Öve compute Sb(t). Figure 2. Kaplan-Meier estimate of survivor function, Sb(t), for IUD data The standard error of the Kaplan-Meier estimate How good is the Kaplan-Meier estimate? We can answer this by computing its standard error. We could use this to compare two lifetime distributions, for example. The variance is given by Greenwoodís formula (proof not required) Var(Sb(t))(Sb(t))2Xk dj , t2[t(k);t(k+1)): (2) j=1nj(nj dj) 1.5 The Cox Regression Model  The non-parametric approach of the Kaplan-Meier (K-M) estimate is not well suited to answering questions concerning the e§ects of covariates on the survival function.  A covariate is any quantity recorded in respect of each life, such as age, sex, type of treatment, level of medication, severity of symptoms and so on.  We will assume that the covariates in respect of the ith life are represented by a 1  p vector zi1; :::; zip.  If a covariate partitions the lives into a small number of homogeneous groups, eg: Sex, then we may Önd a K-M estimate for each group.  But, if the covariate assumes many values, eg: Age, then this approach is not possible. We want a regression type model.  Assume that ( 1; 2;:::; p) is the unknown vector of p regression coe¢ cients corresponding to the covariates.  In this section we consider the proportional hazards model or the Cox regression model (after D. R. Cox who invented it (1972)). DeÖnition: A survival time T follows a Cox proportional hazards model if the hazard function (force of mortality) for the ith life with covariate zi = (zi1; zi2; :::; zip)0 can be written as (t; zi) = 0(t) exp( 0zi) = 0(t) exp( 1zi1 + 2zi2+:::+ pzip): (3) 1. The function 0(t) is known as the baseline hazard, the hazard for an individual with a covariate vector equal to zero. This means that we set 0 = 0, since any Öxed e§ect can simply be absorbed into the baseline hazard. 2. The covariates act multiplicatively on the baseline hazard; equivalently they act additively on the log scale. 3. Cox showed that it is possible to estimate the e§ects of the covariates without specifying the baseline hazard 0(t).  Under the Cox model, the hazards of di§erent lives with covariate vectors z1 and z2 are in the same proportion at all times t: (t; z1) = exp( 0z1); (4) (t;z2) exp( 0z2) 0 = B 2 C ; also (4) gives the Hazard Ratio and the life in the denominator is the reference life.  The utility of this model arises from the fact that the general ëshapeíof the hazard function for all individuals is determined by the baseline hazard, while the exponential term accounts for di§erences between individuals.  Thus, if we know the ís we can compare the lives, since the ís tell us which lives have high/low hazards, and this without knowing the hazard function 0(t)!  So, how do we estimate ? . The partial likelihood estimates the regres- sion coe¢ cients but avoids the need to estimate 0(t). The partial likelihood function, L( )  First, we set up some notation.  We suppose that data are available on n lives, amongst whom there are r distinct death (or failure) times and n r right-censored survival times. We will assume there are no ties in the death times; ties lead to complications so we will avoid this point.  We denote the r ordered death times by t(1) < t(2) < ::: < t(r).  The set of lives who are at risk at time tj is denoted R(t(j)), so that R(t(j)) is the set of lives who are alive and uncensored just prior to t(j). We call R(t(j)) the risk set at time t(j). Suppose the life with explanatory variable zj dies at time tj. Result: The (partial) likelihood function for the Cox proportional hazards model is exp( 0z(j)) exp( 0zl) (5) 1. Maximisation of this expression has to proceed numerically, and we will see how we can use the survival package in R for Ötting a Cox model. 2. The partial likelihood behaves very like an ordinary likelihood function it furnishes all the statistical information needed for standard inference on the regression coe¢ cients. Suppose b is the estimate of found by maximising L( ), or equivalently l( ), the log partial likelihood, and is the true unknown value of . Then, viewed as a random variable, we have asymptotically 3. The derivation of L( ) makes no direct use of the actual censored and uncensored survival times. The censored and uncensored survival times only enter into the summation over the risk sets at the death times. For this reason L( ) is not a true likelihood and it is usually referred to as a partial likelihood. 4. The likelihood function depends only on the ranking of the death and censoring times, since this determines the risk set at each death time. Example 1: Suppose the lifetime, the age and the sex of a number of lives are recorded and the Cox model is Ötted by maximising the partial likelihood. The Ötted parameters are: f = 0:5 where f denotes a female life and = 0:01 is the e§ect of age in years. Taking a male aged 40 as the reference life, Önd the ratio of the hazards for (a) a male aged 20 (b) a male aged 60 (c) a female aged 20 (d) a female aged 60.  In this case we have two covariates sex which is categorical and we assume that it takes the value "0" for males and the value "1" for females age which is continuous and takes the values, 40 (denominator), 20 and 60 for a male in (a) and (b) and 20 and 60 for a male in (c) and (d).  Also, a male aged 40 is the reference life so it will go to the denominator of the ratio of the hazards.  We want to compute the hazard ratio: where 0 = (0:5; 0:01); z0 = (0; age) for a male, and z0 = (1; age) for a female, where we consider that z0  z1 in the numerator, (cases (a), (b) and (c)) and z0  z2 in the denominator (reference life.)  Note that the denominator in each case will be exp( 0z2) = = exp(0:5  0 + 40  0:01) = exp(40  0:01) = 1:491825: (a) exp(0:5  0 + 20  0:01)= exp(40  0:01) = exp(0:2) = 0:82 (b) exp(0:5  0 + 60  0:01)= exp(40  0:01) = exp(0:2) = 1:22 (c) exp(0:5  1 + 20  0:01)= exp(40  0:01) = exp(0:7) = 0:50 (d) exp(0:5  1 + 60  0:01)= exp(40  0:01) = exp(0:3) = 0:74 Before giving our next example we give a reminder of the deÖnition of the score function and deÖne the observed information function. DeÖnition: We deÖne the score function U( ) by U( )= @l (6) @ where l = l( ) = Log(L( )) is the log-likelihood function. DeÖnition: We deÖne the observed information function Inf( ) by Inf( )=@ 2 (7) Reminder: I( ) = E  @2l  is the Fisher information function. @2 Example 2: The table gives the data on times to claim for permanent health insurance (PHI) policies for two groups of lives. Male 2 5+ 9 11+ 14 Female 4 7+ 10+ 12+ 15 where the + indicates that the observation was censored.  We want to compute the partial likelihood in the Cox proportional hazards model which is given by exp( z(j)) where, as was previously mentioned, R(t(j)) is the set of lives who are alive and uncensored just prior to the r ordered death (or failure) times t(1) < t(2) < ::: < t(r) and where z0 ( i.e. z(j) in the numerator and zl in the denominator) is the vector of covariate information and are the regression coe¢ cients.  Herez0 =sexandweassumez0 =0foramale,andz0 =1forafemale. This means that each individualís multiplier (in the numerator and the denominator) is given by e z =( 1; ifz=0,i.e. males e ; ifz=1,i.e. females Step by Step approach for calculating L( ) We have data on 10 individuals, 5 males and 5 females. Once more Male 2 5+ 9 11+ 14 Female 4 7+ 10+ 12+ 15 1. Order the non-censored death times t(j). Here, there are 5 non- censored (i.e. without a "+" sing) death times in total, i.e. j = 1; ::; 5. Of those 5 deaths, d1 = 3 come from the Örst group (males) and d2 = 2 come from the second group (females). Hence, we get t(1) = 2 < t(2) = 4 < t(3) = 9 < t(4) = 14 < t(5) = 15: 2. Next we focus on R(t(j)): Let r(1) be the number of lives from the Örst group (males) who are at risk just before the death times t(j) and r(2) be the number of lives from the second group (females) who j are at risk just before the death times t(j), j = 1; ::; 5:  To calculate r(1) and r(2) we need to Önd how many values form the Table above are greater than or equal to t(j), j = 1; ::; 5: Once more, the Table is Male 2 5+ 9 11+ 14 Female 4 7+ 10+ 12+ 15  The values of j; t(j), r(1) and r(2) are provided in a new Table below: jj j t(j) r(1) r(2) jj 1255 2445 3933 4 14 1 1 5 15 0 1 3. The partial likelihood is given by e 0d1 e 1d2 L( ) = X exp( zl) = Y5  (1) (2)  Y5 exp( z(j)) j=1 l2R(t(j)) rj e 0+rj e 1 j=1 e 03e 12 2 Y5 r(1)e 0 + r(2)e 01 Y5 r(1) + r(2)e  jjjj z0 = 0 and d1 = 3 for a male, and z0 = 1 and d2 = 2 for a female. Hence using the values of r(1) and r(2) from the above Table, for j = jj 1;::;5, weget e2 L()= Y5r(1)+r(2)e 5+5e 4+5e 3+3e 1+e 0+e  = =ee 5+5e 4+5e 3+3e 1+e 0+e =e1111111 5 1+e 4+5e 3 1+e 1+e / e (keep all the terms which involve ) (1+e )3(4+5e ) ) l( ) = 3log(1+e )log(4+5e ) (log-likelihood) Estimation: The partial likelihood estimate of satisÖes =0: (8) @ 1 + e 4 + 5e Trick 1: Putting x = eb we Önd 1b5x2 +8x4 = 0 so x = (p764)=15 = 0:3145 (since x > 0) and hence = 1:1567.
What does b mean? We have
f(t) 0(t)e 1 0(t)e 1:1567
m(t) = 0(t)e 0 = 0(t) =e =e =0:3145 (9)

and so f (t)  0:3m(t) for all t: Standard error of b: We use
@2l! Var( )  1=I( )=1= @ 2 ]
= 1= @U ]=b (10)
Trick 2: It is convenient to use the chain rule with  = e ; to evaluate the
Reminder: In calculus, the chain rule is a formula for computing the derivative of the composition f g = f (g (x)) ;of two functions say f and g.

It holds that
(f  g)= = f0  g  g= = f0 (g (x))  g=(x) d = de=ed=e=
=) dU( )= @2l =dU( )=dU( ); wherefrom(15): d @2 d= d
In our case

5e : @ 1 + e 4 + 5e
= 1 3 5
1 +  4 + 5
Hence to calculate  @2l  we need to replace e with  in (15 or above), @2
then take the derivative of all terms w.r.t  and multiply this by ; we get:

= @U!=d1+ 3 + 5  @2 @ d 1+4+5
= d 3 + 5  d 1+ 4+5
 (same but more conveniently written)  0:7486; since  = e = 0:3145
=  d 3 3 +1 ! 4 d 1 +  4 + 5
= 3 + 20  (1 + )2 (4 + 5)2
V ar(b)  1=0:7486 =) StErr(b) = 1:1558: (11)

Hypothesis testing: We want to test H0 : = 0 vs H1 : 6= 0, for e.g. at a 5% threshold. In particular, we often want to test H0 : = 0 vs H1 : 6= 0, the hypothesis of no di§erence b

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com