Chapter 5: Statistical Methods in Survival Analysis

Lawrence M. Leemis

Chapter 5 Statistical Methods in Survival Analysis

The previous chapter introduced probability models that are frequently used in survival analysis. This chapter introduces the associated statistical methods.

The focus in this chapter is the use of maximum likelihood for parameter estimation and inference. Likelihood theory is illustrated in the first section. The matrix of the expected values of the opposite of the second partial derivatives of the log likelihood function is known as the Fisher information matrix and its statistical analog, the observed information matrix, is useful for determining confidence intervals for parameters. Asymptotic properties of the likelihood function, which are associated with large sample sizes, are reviewed in the second section. One distinctive feature of lifetime data is the presence of censoring, which occurs when only an upper or lower bound on the lifetime is known. Statistical methods for handling censored data values are introduced in the third section. The focus is on right censoring, where only a lower bound on the failure time is known. These methods are applied to the exponential distribution and the Weibull distribution in the next two sections. Finally, the last section indicates how to fit the proportional hazards model to a data set consisting of lifetimes with associated covariates.

5.1 Likelihood Theory

There are always merits in obtaining raw data (that is, exact individual failure times), as opposed to grouped data (counts of the number of failures over prescribed time intervals). Given raw data, we can always construct grouped data, but the converse is typically not true; therefore, we limit discussion in this chapter to the raw data case.

The random variable T has denoted a random lifetime in previous chapter. So it is natural to use [latex]T_1, \, T_2, \, \ldots, \, T_n[/latex] to denote a random sample of n such lifetimes, where n is the number of items on test. When specific values are given for realizations of such lifetimes, which is typically the case from this point forward, they are denoted by [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex]. In other words, [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are the experimental values of the mutually independent and identically distributed random variables [latex]T_1, \, T_2, \, \ldots, \, T_n[/latex]. The associated ordered observations, or order statistics, are denoted by [latex]t_{(1)}, \, t_{(2)}, \, \ldots, \, t_{(n)}[/latex].

The Greek letter θ is often used to denote a generic unknown parameter. We will refer to [latex]\hat \theta[/latex] in the abstract as a point estimator; when [latex]\hat \theta[/latex] assumes a specific numeric value, it will be referred to as a point estimate. The probability distribution of a statistic is referred to as a sampling distribution.

Assume that there is a single unknown parameter θ in the probability model for T. Assume further that the data values [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are mutually independent and identically distributed random variables. The joint probability density function of the data values is the product of the marginal probability density functions of the individual observations:

$\begin{array}{l} L (t_{1}, t_{2}, \dots, t_{n}, θ) = \prod_{i = 1}^{n} f (t_{i}; θ) . \end{array}$

This function is the likelihood function. In order to simplify the notation, the likelihood function is often written as simply

$\begin{array}{l} L (θ) = \prod_{i = 1}^{n} f (t_{i}) . \end{array}$

The maximum likelihood estimator of θ, which is denoted by [latex]\hat \theta[/latex], is the value of θ that maximizes [latex]L(\theta)[/latex].

The next example reviews the associated notions of the log likelihood function, score vector, maximum likelihood estimator, Fisher information matrix, and observed information matrix for a two-parameter lifetime model. We assume for now that there are no censored observations in the data set; all of the failure times are observed.

Example 5.1 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from an inverse Gaussian (Wald) population having unknown positive parameters λ and μ, where μ is the population mean. The probability density function of the inverse Gaussian distribution is

$\begin{array}{l} f (t) = \sqrt{\frac{λ}{2 π}} t^{- 3 / 2} e^{- λ (t - μ)^{2} / (2 μ^{2} t)} t > 0. \end{array}$

Find the likelihood function, log likelihood function, score vector, maximum likelihood estimator, Fisher information matrix, and observed information matrix.

The likelihood function is

$\begin{array}{l} L (t, λ, μ) & = \prod_{i = 1}^{n} \sqrt{\frac{λ}{2 π}} t_{i}^{- 3 / 2} e^{- λ (t_{i} - μ)^{2} / (2 μ^{2} t_{i})} \\ = λ^{n / 2} (2 π)^{- n / 2} {[\prod_{i = 1}^{n} t_{i}]}^{- 3 / 2} e^{- λ / (2 μ^{2}) \sum_{i = 1}^{n} (t_{i} - μ)^{2} / t_{i}}, \end{array}$

where [latex]{\boldsymbol t} = ( t_1, \, t_2, \, \ldots, \, t_n)[/latex]. The likelihood function and any monotonic transformation of the likelihood function are maximized at the same value. Since the calculus and algebra is often easier when working with the logarithm of the likelihood function, we do so in this setting. The log likelihood function is

$\begin{array}{l} \ln L (t, λ, μ) = \frac{n}{2} \ln λ - \frac{n}{2} \ln (2 π) - \frac{3}{2} \sum_{i = 1}^{n} \ln t_{i} - \frac{λ}{2 μ^{2}} \sum_{i = 1}^{n} \frac{(t_{i} - μ)^{2}}{t_{i}} . \end{array}$

The two-component score vector, [latex]{\boldsymbol U} (\lambda, \, \mu)[/latex], consists of the partial derivatives with respect to the two unknown parameters:

$\begin{array}{l} \frac{\partial \ln L (t, λ, μ)}{\partial λ} = \frac{n}{2 λ} - \frac{1}{2 μ^{2}} \sum_{i = 1}^{n} \frac{(t_{i} - μ)^{2}}{t_{i}} \end{array}$

and

$\begin{array}{l} \frac{\partial \ln L (t, λ, μ)}{\partial μ} = \frac{λ}{μ^{3}} [\sum_{i = 1}^{n} t_{i} - n μ] . \end{array}$

When the second equation is equated to zero, the maximum likelihood estimator [latex]\hat{\mu}[/latex] is determined. Then using [latex]\hat{\mu}[/latex] as an argument in the first equation and solving for [latex]\hat{\lambda}[/latex] results in the maximum likelihood estimators

$\begin{array}{l} \hat{λ} = {[\frac{1}{n} \sum_{i = 1}^{n} \frac{1}{t_{i}} - \frac{n}{\sum_{i = 1}^{n} t_{i}}]}^{- 1} a n d \hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} t_{i} . \end{array}$

The second partial derivatives of the log likelihood function are

$\begin{array}{l} \frac{\partial^{2} \ln L (t, λ, μ)}{\partial λ^{2}} = - \frac{n}{2 λ^{2}} \end{array}$

$\begin{array}{l} \frac{\partial^{2} \ln L (t, λ, μ)}{\partial λ \partial μ} = \frac{1}{μ^{3}} \sum_{i = 1}^{n} t_{i} - \frac{n}{μ^{2}} \end{array}$

$\begin{array}{l} \frac{\partial^{2} \ln L (t, λ, μ)}{\partial μ^{2}} = - \frac{3 λ}{μ^{4}} \sum_{i = 1}^{n} t_{i} + \frac{2 n λ}{μ^{3}} . \end{array}$

Since [latex]E[T] = \mu[/latex] for the inverse Gaussian distribution, the Fisher information matrix consists of the expected values of the negatives of these derivatives:

$\begin{array}{l} I (λ, μ) = (\begin{array}{c} E [\frac{- \partial^{2} \ln L (t, λ, μ)}{\partial λ^{2}}] & E [\frac{- \partial^{2} \ln L (t, λ, μ)}{\partial λ \partial μ}] \\ E [\frac{- \partial^{2} \ln L (t, λ, μ)}{\partial μ \partial λ}] & E [\frac{- \partial^{2} \ln L (t, λ, μ)}{\partial μ^{2}}] \end{array}) = (\begin{array}{c} \frac{n}{2 λ^{2}} & 0 \\ 0 & \frac{n λ}{μ^{3}} \end{array}) . \end{array}$

The Fisher information matrix is the variance–covariance matrix of the score vector. The off-diagonal elements being zero for the inverse Gaussian distribution implies that the elements of the score vector are uncorrelated. Although this example has simple closed-form expressions for the Fisher information matrix, it is more often the case that the elements of the Fisher information matrix are not closed form. The observed information matrix can be calculated for all distributions; it uses the maximum likelihood estimates:

$\begin{array}{l} O (\hat{λ}, \hat{μ}) = {(\begin{array}{c} \frac{- \partial^{2} \ln L (t, λ, μ)}{\partial λ^{2}} & \frac{- \partial^{2} \ln L (t, λ, μ)}{\partial λ \partial μ} \\ \frac{- \partial^{2} \ln L (t, λ, μ)}{\partial μ \partial λ} & \frac{- \partial^{2} \ln L (t, λ, μ)}{\partial μ^{2}} \end{array})}_{λ = \hat{λ}, μ = \hat{μ}} = (\begin{array}{c} \frac{n}{2 {\hat{λ}}^{2}} & 0 \\ 0 & \frac{n \hat{λ}}{{\hat{μ}}^{3}} \end{array}) . \end{array}$

In some cases, it is possible to find the exact distribution of a pivotal quantity which results in exact statistical inference (that is, constructing exact confidence intervals and performing exact hypothesis tests). It is more often the case that exact statistical inference is not possible, and asymptotic properties associated with the likelihood function must be relied on for approximate inference. The next section reviews some asymptotic properties that arise in likelihood theory. When a large data set of lifetimes is available, these properties often lead to approximate statistical methods of inference.

5.2 Asymptotic Properties

When the number of items on test n is large, there are some asymptotic results concerning the likelihood function that are useful for constructing confidence intervals and performing hypothesis tests associated with a vector of p unknown parameters [latex]\boldsymbol{\theta} = \left( \theta_1 , \, \theta_2 , \, \ldots , \, \theta_p \right) ^ \prime[/latex]. As indicated in the example in the last section, the [latex]p \times 1[/latex] score vector [latex]{\boldsymbol U} (\boldsymbol{\theta})[/latex] has elements

$\begin{array}{l} U_{i} (θ) = \frac{\partial \ln L (t, θ)}{\partial θ_{i}} = \frac{\partial}{\partial θ_{i}} \sum_{j = 1}^{n} \ln f (t_{j}, θ) \end{array}$

for [latex]i = 1, \, 2, \, \ldots, \, p[/latex]. Therefore, each element of the score vector is a sum of mutually independent random variables, and, when n is large, the elements of [latex]\boldsymbol{U}( \boldsymbol{\theta})[/latex] are asymptotically normally distributed by the central limit theorem. More specifically, the score vector [latex]\boldsymbol{U} (\boldsymbol{\theta})[/latex] is asymptotically normal with population mean [latex]{\bf 0}[/latex] and variance–covariance matrix [latex]I(\boldsymbol{\theta})[/latex], where [latex]I(\boldsymbol{\theta})[/latex] is the Fisher information matrix. This means that when the true value for the parameter vector is [latex]\boldsymbol{\theta}_0[/latex] then

$\begin{array}{l} U^{'} (θ_{0}) I (θ_{0})^{- 1} U (θ_{0}) \end{array}$

is asymptotically chi-square with p degrees of freedom. This can be used to determine confidence intervals and perform hypothesis tests with respect to [latex]\boldsymbol{\theta}[/latex].

The maximum likelihood estimator for the parameter vector [latex]\hat{\boldsymbol{\theta}}[/latex] can also be used for confidence intervals and hypothesis testing. Since [latex]\hat{\boldsymbol{\theta}}[/latex] is asymptotically normal with population mean [latex]\boldsymbol{\theta}[/latex] and variance–covariance matrix [latex]I^{-1} (\boldsymbol{\theta})[/latex], when [latex]\boldsymbol{\theta} = \boldsymbol{\theta}_0[/latex],

$\begin{array}{l} {(\hat{θ} - θ_{0})}^{'} I (θ_{0}) (\hat{θ} - θ_{0}) \end{array}$

is also asymptotically chi-square with p degrees of freedom. Two statistics that are asymptotically equivalent to this statistic that can be used to estimate the value of the chi-square random variable are

$\begin{array}{l} {(\hat{θ} - θ_{0})}^{'} I (\hat{θ}) (\hat{θ} - θ_{0}) \end{array}$

and

$\begin{array}{l} {(\hat{θ} - θ_{0})}^{'} O (\hat{θ}) (\hat{θ} - θ_{0}) . \end{array}$

A third asymptotic result involves the likelihood ratio statistic

$\begin{array}{l} - 2 [\ln L (θ) - \ln L (\hat{θ})] = - 2 \ln [\frac{L (θ)}{L (\hat{θ})}], \end{array}$

which is asymptotically chi-square with p degrees of freedom. The conditions necessary for these asymptotic properties to apply are cited at the end of the chapter.

These three asymptotic results are summarized in the result below, where the a above the ∼ is shorthand for “asymptotically distributed.”

Example 5.2 Let [latex]t_1, \, t_2, \, \ldots , \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \frac{1}{\sqrt{2 π t^{3}}} e^{- (t - μ)^{2} / (2 μ^{2} t)} t > 0, \end{array}$

where μ is a positive unknown parameter, which is the population mean. This population distribution is a special case of the two-parameter inverse Gaussian distribution. Use one of the asymptotic results from Theorem 5.1 to construct an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for μ.

The first step is to find the maximum likelihood estimator of μ. The likelihood function is

$\begin{array}{l} L (t, μ) & = \prod_{i = 1}^{n} {(2 π t_{i}^{3})}^{- 1 / 2} e^{- (t_{i} - μ)^{2} / (2 μ^{2} t_{i})} \\ = (2 π)^{- n / 2} {[\prod_{i = 1}^{n} t_{i}]}^{- 3 / 2} e^{- \sum_{i = 1}^{n} (t_{i} - μ)^{2} / (2 μ^{2} t_{i})} . \end{array}$

The log likelihood function is

$\begin{array}{l} \ln L (t, μ) = - \frac{n}{2} \ln (2 π) - \frac{3}{2} \sum_{i = 1}^{n} \ln t_{i} - \frac{1}{2 μ^{2}} \sum_{i = 1}^{n} \frac{(t_{i} - μ)^{2}}{t_{i}} . \end{array}$

The score is the derivative of the log likelihood function with respect to μ, which, after simplification, is

$\begin{array}{l} \frac{\partial \ln L (t, μ)}{\partial μ} = \frac{1}{μ^{3}} [\sum_{i = 1}^{n} t_{i} - n μ] . \end{array}$

When this equation is equated to zero, the maximum likelihood estimator for μ is

$\begin{array}{l} \hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}, \end{array}$

which is the sample mean. The second partial derivative of the log likelihood function is

$\begin{array}{l} \frac{\partial^{2} \ln L (t, μ)}{\partial μ^{2}} = - \frac{3}{μ^{4}} \sum_{i = 1}^{n} t_{i} + \frac{2 n}{μ^{3}}, \end{array}$

which is negative at the maximum likelihood estimator, so the maximum likelihood estimator maximizes the log likelihood function. The next step is to find the [latex]1 \times 1[/latex] Fisher information matrix. Using the second partial derivative of the log likelihood function, the Fisher information matrix is

$\begin{array}{l} I (μ) = E [- \frac{\partial^{2} \ln L (t, μ)}{\partial μ^{2}}] = E [\frac{3}{μ^{4}} \sum_{i = 1}^{n} t_{i} - \frac{2 n}{μ^{3}}] = \frac{3 n μ}{μ^{4}} - \frac{2 n}{μ^{3}} = \frac{n}{μ^{3}} \end{array}$

because [latex]E[X] = \mu[/latex] for this population distribution. The [latex]1 \times 1[/latex] observed information is

$\begin{array}{l} O (\hat{μ}) = {[- \frac{\partial^{2} \ln L (t, μ)}{\partial μ^{2}}]}_{μ = \hat{μ}} = \frac{n}{{\hat{μ}}^{3}} . \end{array}$

In order to construct an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for μ, recall that [latex]\hat {\mu}[/latex] is asymptotically normal with population mean μ and variance–covariance matrix [latex]I ^ {\kern 0.04em -1} ( \mu )[/latex]. In other words,

$\begin{array}{l} \hat{μ} \overset{a}{\sim} N (μ, I^{- 1} (μ)) . \end{array}$

For large values of n, we replace the Fisher information matrix with the observed information matrix:

$\begin{array}{l} \hat{μ} \overset{a}{\sim} N (μ, O^{- 1} (\hat{μ})) \end{array}$

or

$\begin{array}{l} \hat{μ} \overset{a}{\sim} N (μ, \frac{{\hat{μ}}^{3}}{n}) . \end{array}$

This random variable can be standardized by subtracting its population mean and dividing by its population standard deviation:

$\begin{array}{l} \frac{\hat{μ} - μ}{\sqrt{{\hat{μ}}^{3} / n}} \overset{a}{\sim} N (0, 1) . \end{array}$

So the probability that this random variable falls between [latex]-z_{\alpha/2}[/latex] and [latex]z_{\alpha/2}[/latex] for large n is

$\begin{array}{l} lim_{n \to \infty} P (- z_{α / 2} < \frac{\hat{μ} - μ}{\sqrt{{\hat{μ}}^{3} / n}} < z_{α / 2}) = 1 - α . \end{array}$

where [latex]z_{\alpha/2}[/latex] is the [latex]1 - \alpha / 2[/latex] quantile of the standard normal distribution. Rearranging the inequality

$\begin{array}{l} - z_{α / 2} < \frac{\hat{μ} - μ}{\sqrt{{\hat{μ}}^{3} / n}} < z_{α / 2} \end{array}$

so that μ is in the center of the inequality yields the asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval

$\begin{array}{l} \hat{μ} - z_{α / 2} \frac{{\hat{μ}}^{3 / 2}}{\sqrt{n}} < μ < \hat{μ} + z_{α / 2} \frac{{\hat{μ}}^{3 / 2}}{\sqrt{n}}, \end{array}$

where [latex]\hat {\mu}[/latex] is the sample mean of the observed data values. The actual coverage of confidence intervals developed in this fashion typically approaches [latex]1 - \alpha[/latex] as the number of items on test n increases.

All of the statistical methods developed thus far have assumed that we are able to observe all n of the items on test fail. The associated lifetimes are denoted by [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex]. Although this is ideal and might be the case in some settings, a short testing time or items with long lifetimes might result in some items that survive the test. The lifetimes of the items which do not fail during the test are known as right-censored observations. The lifetimes of these items are not observed, but are known to exceed the time at which the test is concluded. If a decision concerning the acceptability of the items must be made with some of the items still operating at the end of the test, then a statistical model must be formulated to account for the unobserved lifetimes of these items. The next section introduces the important topic of censoring, which is pervasive in survival analysis.

5.3 Censoring

Censoring occurs in lifetime data sets when only an upper or lower bound on the lifetime is known. Censoring occurs frequently in lifetime data sets because it is often impossible or impractical to observe the lifetimes of all the items on test. A data set for which all failure times are known is called a complete data set. Figure 5.1 illustrates a complete data set of [latex]n = 5[/latex] items placed on test simultaneously at time [latex]t = 0[/latex], where the [latex]\times[/latex]‘s denote failure times. Consider the two endpoints of each of the horizontal segments. It is critical to provide an unambiguous definition of the time origin (for example, the time a product is purchased or the time a cancer is diagnosed). Likewise, failure must be defined in an unambiguous fashion. This is easier to define for a light bulb or a fuse than for a ball bearing or a sock. Outside of a reliability setting, a data set of lifetimes is often generically referred to as time-to-event data, corresponding to the time between the time origin and the event of interest. A censored observation occurs when only a bound is known on the time of failure. If a data set contains one or more censored observations, it is called a censored data set.

A graph of a complete data set of the 5 failure items placed on test at the value of time t equals 0. — Figure 5.1: Complete data set with [latex]n = 5[/latex].

Long Description for Figure 5.1

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5 from top to bottom. The time to event data is shown as horizontal line segments of different lengths from the vertical axis with X marks at the other end. The X marks indicate the failure times. All line segments are drawn from t equals 0. Item 4 has a maximum number of failures, followed by items 3, 1, 2, and 5, in that order.

The most common type of censoring is known as right censoring. In a right-censored data set, one or more items have only a lower bound known on their lifetime. The term sample size is now vague. From this point forward, we use n to denote the number of items on test and use r to denote the number of observed failures. In an industrial life testing situation, for example, [latex]n = 12[/latex] cell phones are put on a continuous, rigorous life test on January 1, and [latex]r = 3[/latex] of the cell phones have failed by December 31. These failed cell phones are discarded upon failure. The remaining [latex]n - r = 12 - 3 = 9[/latex] cell phones that are still operating on December 31 have lifetimes that exceed 365 days, and are therefore right-censored observations. Right censoring is not limited to just reliability applications. In a medical study in which T is the survival time after the diagnosis of a particular type of cancer, for example, a patient can either (a) still be alive at the end of a study, (b) die of a cause other than the particular type of cancer, constituting a right-censored observation, or (c) lose contact with the study (for example, if they leave town), constituting a right-censored observation.

Three special cases of right censoring are common in survival analysis. The first is Type II or order statistic censoring. As shown in Figure 5.2, this corresponds to terminating a study upon one of the ordered failures. The diagram corresponds to a set of [latex]n = 5[/latex] items placed on a test simultaneously at time [latex]t = 0[/latex]. The test is terminated when [latex]r = 3[/latex] failures are observed. Time advances from left to right in Figure 5.2 and the failure of the first item (corresponding to the third ordered observed failure) terminates the test. The lifetimes of the third and fourth items are right censored. Observed failure times are indicated by an [latex]{\large{\times}}[/latex] and right-censoring times are indicated by a [latex]\, \Large{\circ}[/latex]. In Type II censoring, the time to complete the test is random.

A graph of a complete data set of right censored data of 5 items, with 3 observed failures. — Figure 5.2: Type II right-censored data set with [latex]n = 5[/latex] and [latex]r = 3[/latex].

Long Description for Figure 5.2

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5 from top to bottom. All horizontal line segments are drawn from t equals 0. Line segments of different lengths are represented with an X mark and a circle at the other end. The X marks indicate the failure times and the circles denote censoring times. Items 1, 3, and 4 have equal maximum failure times, and item 5 has a short failure time. The lifetimes of the third and fourth items are right censored. The termination times of items 1, 3, and 4 are connected by a dotted vertical line.

The second special case is Type I or time censoring. As shown in Figure 5.3, this corresponds to terminating the study at a particular time. The diagram shows a set of [latex]n = 5[/latex] items placed on a test simultaneously at [latex]t = 0[/latex] that is terminated at the time indicated by the dotted vertical line. For the realization illustrated in Figure 5.3, there are [latex]r = 4[/latex] observed failures. In Type I censoring, the number of failures r is random.

A graph of a complete data set of right censored data of 5 items, with 4 observed failures. — Figure 5.3: Type I right-censored data set with [latex]n = 5[/latex] and [latex]r = 4[/latex].

Long Description for Figure 5.3

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5, from top to bottom. All horizontal line segments are drawn from t equals 0. Line segments of different lengths are marked with X marks and circles at the other end. The X marks indicate the failure times and the circles indicate censoring times Item 4 has a maximum lifetime, next comes, items 3, 1, 2, and 5. Among these, item 4 is right censored. A dotted vertical line is drawn on the termination of item 4.

Finally, random censoring occurs when individual items are withdrawn from the test at any time during the study. Figure 5.4 illustrates a realization of a randomly right-censored life test with [latex]n = 5[/latex] items on test and [latex]r = 2[/latex] observed failures. It is usually assumed that the failure times and the censoring times are mutually independent random variables and that the probability distribution of the censoring times does not involve any unknown parameters from the failure time distribution. In other words, in a randomly censored data set, items cannot be more or less likely to be censored because they are at unusually high or low risk of failure.

A graph of a complete data set of right censored data of 5 items, with 2 observed failures. — Figure 5.4: Randomly right-censored data set with [latex]n = 5[/latex] and [latex]r = 2[/latex].

Long Description for Figure 5.4

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5 from top to bottom. All horizontal line segments begin at the same time t equals 0. Horizontal line segments of different lengths are marked with X marks or circles at the other end. The X marks indicate the failure times and the circles denote censoring times. Item 4 has maximum lifetime, next comes, items 3, 1, 2, and 5. Among these, items 2, 3, and 5 are right censored.

Although other types of censoring exist, such as left censoring and interval censoring, the focus of this chapter will be on right censoring because it is the most common type of censoring. In the case of right censoring, the ratio [latex]r / n[/latex] is the fraction of items which are observed to fail. When [latex]r / n[/latex] is close to one, the data set is referred to as a lightly censored data set; when [latex]r / n[/latex] is close to zero, the data set is referred to as a heavily censored data set. In the reliability setting, many data sets are heavily censored because the items have long lifetimes. In the biomedical setting, certain cancers have long remission times, resulting in heavily censored data sets.

Of the following three approaches to handling the problem of censoring, only one is both valid and practical. The first approach is to ignore all the censored values and to perform analysis only on those items that were observed to fail. Although this simplifies the mathematics involved, it is not a valid approach. If, for example, this approach is used on a right-censored data set, the analyst is discarding the right-censored values, and these are typically the items that have survived the longest. In this case, the analyst arrives at an overly pessimistic result concerning the lifetime distribution because the best items (that is, the right-censored observations) have been excluded from the analysis. A second approach is to wait for all the right-censored observations to fail. Although this approach is valid statistically, it is not practical. In an industrial setting, waiting for the last light bulb to burn out or the last machine to fail may take so long that the product being tested will not get to market in time. In a medical setting, waiting for the last patient to die from a particular disease may take decades. For these reasons, the proper approach is to handle censored observations probabilistically, including the censored values in the likelihood function.

The likelihood function for a censored data set can be written in several different equivalent forms. Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be mutually independent observations denoting lifetimes sampled randomly from a population. The corresponding right-censoring times are denoted by [latex]c_1, \, c_2, \, \ldots, \, c_n[/latex]. The t_i and c_i values are assumed to be independent, for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. In the case of Type I right censoring, [latex]c_1 = c_2 = \cdots = c_n = c[/latex]. The set U contains the indexes of the items that are observed to fail during the test (that is, the uncensored observations):

$\begin{array}{l} U = {i | t_{i} \leq c_{i}} . \end{array}$

The set C contains the indexes of the items whose failure time exceeds the corresponding censoring time (that is, those that are right censored):

$\begin{array}{l} C = {i | t_{i} > c_{i}} . \end{array}$

This notation, along with an important notion known as alignment, are illustrated in the next example.

Example 5.3 Consider the case of [latex]n = 5[/latex] items placed on test as indicated in Figure 5.5. Find the sets U and C.

Figure 5.5: Randomly right-censored data set.

Long Description for Figure 5.5

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5 from top to bottom. All of the line segments do not start at the same time. Horizontal line segments of different lengths are marked with an X mark and a circle at the other end. Item 4 starts at t equals 0 and has a maximum lifetime. Item 3 starts a little away from t equals 0 and has a lifetime shorter than item 4. Item 1 starts near t equals 0, and has less lifetime than item 3. Item 2 starts near the starting point of item 1 and has a shorter lifetime than item 1. Item 5 starts far away from t equals 0, and has much shorter lifetime than item 2. Items 3 and 5 are right censored.

Observe that the right-censored data set depicted in Figure 5.5, unlike the previous right-censored data sets with [latex]n = 5[/latex] items on test, does not have all of the items starting on test at time [latex]t = 0[/latex]. This is quite common in practice. A software engineer, for example, cannot get all customers to purchase a computer program at the same time; a medical researcher evaluating the time between first and second heart attacks cannot get all of the patients in the study to have their first heart attack at the same time; a casualty actuary cannot get all customers to purchase motorcycle insurance at the same time. In all cases, it is necessary to shift each data value back to a common origin. As long as there are not any changes to the items over the time window of observation, aligning the data values in this fashion is appropriate. Figure 5.6 displays the aligned data set. In this particular case, the first, second, and fourth items were observed to fail, and the failure times for the third and fifth items were right-censored. Therefore, the sets U and C are

$\begin{array}{l} U = {1, 2, 4} a n d C = {3, 5} . \end{array}$

Figure 5.6: Aligned randomly right-censored data set.

Long Description for Figure 5.6

The horizontal axis measures time t and has the initial time t equals 0. The vertical axis represents the number of items 1 through 5 from the top to the bottom. All line segments begin at the same time t equals 0. Horizontal line segments of different lengths are marked with X marks or circles at the other end. Item 4 has a maximum lifetime, followed by items 3, 1, 2, and 5, in the order. Among these, items 3, and 5 are right censored.

The usual form for right-censored lifetime data is given by the pairs [latex](x_i, \, \delta_i)[/latex], where [latex]x_i = \min \{t_i, \, c_i\}[/latex] and δ_i is a censoring indicator variable:

$\begin{array}{l} δ_{i} = {\begin{cases} 0 & t_{i} > c_{i} \\ 1 & t_{i} \leq c_{i} \end{cases} \end{array}$

for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. The [latex](x_i, \, \delta_i)[/latex] pairs can be reconstructed from the [latex](t_i, \, c_i)[/latex] pairs and vice versa. Hence, δ_i is 1 if the failure of item i is observed and 0 if the failure of item i is right censored, and x_i is the failure time (when [latex]\delta_i = 1[/latex]) or the censoring time (when [latex]\delta_i = 0[/latex]). For the vector of unknown parameters [latex]\boldsymbol{\theta} = (\theta_1, \, \theta_2, \, \ldots, \, \theta_p) ^ \prime[/latex], ignoring a constant factor, the likelihood function is

$\begin{array}{l} L (x, θ) = \prod_{i = 1}^{n} f (x_{i}, θ)^{δ_{i}} S (x_{i}, θ)^{1 - δ_{i}} = \prod_{i \in U} f (t_{i}, θ) \prod_{i \in C} S (c_{i}, θ) \end{array}$

where [latex]S (c_i, \, \boldsymbol{\theta})[/latex] is the survivor function of the population distribution with parameters [latex]\boldsymbol{\theta}[/latex] evaluated at censoring time c_i, [latex]i \, \in \, C[/latex]. The reason that the survivor function is the appropriate term in the likelihood function for a right-censored observation is that [latex]S(c_i, \, \boldsymbol{\theta})[/latex] is the probability that item i survives to c_i. The log likelihood function is

$\begin{array}{l} \ln L (x, θ) = \sum_{i \in U} \ln f (t_{i}, θ) + \sum_{i \in C} \ln S (c_{i}, θ), \end{array}$

or

$\begin{array}{l} \ln L (x, θ) = \sum_{i \in U} \ln f (x_{i}, θ) + \sum_{i \in C} \ln S (x_{i}, θ) . \end{array}$

Since the probability density function is the product of the hazard function and the survivor function, the log likelihood function can be simplified to

$\begin{array}{l} \ln L (x, θ) = \sum_{i \in U} \ln h (x_{i}, θ) + \sum_{i \in U} \ln S (x_{i}, θ) + \sum_{i \in C} \ln S (x_{i}, θ) \end{array}$

or

$\begin{array}{l} \ln L (x, θ) = \sum_{i \in U} \ln h (x_{i}, θ) + \sum_{i = 1}^{n} \ln S (x_{i}, θ), \end{array}$

where the second summation now includes all n items on test. Finally, to write the log likelihood in terms of the hazard and cumulative hazard functions only,

$\begin{array}{l} \ln L (x, θ) = \sum_{i \in U} \ln h (x_{i}, θ) - \sum_{i = 1}^{n} H (x_{i}, θ), \end{array}$

since [latex]H(t) = -\ln \, S(t)[/latex]. The choice of which of these three expressions for the log likelihood to use for a particular distribution depends on the particular forms of [latex]S(t)[/latex], [latex]f(t)[/latex], [latex]h(t)[/latex], and [latex]H(t)[/latex]. In other words, one of the distribution representations may possess a mathematical form that is advantageous over the others.

The next example will use the last version of the log likelihood function to find a maximum likelihood estimator and an asymptotically exact confidence interval for an unknown parameter.

Example 5.4 Consider a life test with n items on test with random right censoring and [latex]r \ge 1[/latex] observed failures. Assume that previous tests on these same items informs us that lifetimes of the items are drawn from a Rayleigh population with positive unknown parameter λ. Find the maximum likelihood estimator and construct an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ.

The survivor function for the Rayleigh distribution is

$\begin{array}{l} S (t) = e^{- (λ t)^{2}} t \geq 0. \end{array}$

The associated cumulative hazard function and hazard function are

$\begin{array}{l} H (t) = - \ln S (t) = (λ t)^{2} t \geq 0 \end{array}$

and

$\begin{array}{l} h (t) = H^{'} (t) = 2 λ^{2} t t \geq 0. \end{array}$

In the case of random right censoring, the log likelihood function is

$\begin{array}{l} \ln L (x, λ) & = \sum_{i \in U} \ln h (x_{i}, λ) - \sum_{i = 1}^{n} H (x_{i}, λ) \\ = \sum_{i \in U} \ln (2 λ^{2} x_{i}) - \sum_{i = 1}^{n} {(λ x_{i})}^{2} \\ = r \ln 2 + 2 r \ln λ + \sum_{i \in U} \ln x_{i} - λ^{2} \sum_{i = 1}^{n} x_{i}^{2}, \end{array}$

where r is the number of observed failures. The single-element score vector can be found by differentiating the log likelihood function with respect to λ:

$\begin{array}{l} \frac{\partial \ln L (x, λ)}{\partial λ} = \frac{2 r}{λ} - 2 λ \sum_{i = 1}^{n} x_{i}^{2} . \end{array}$

Equating the score to zero and solving for λ yields the maximum likelihood estimator

$\begin{array}{l} \hat{λ} = \sqrt{\frac{r}{\sum_{i = 1}^{n} x_{i}^{2}}} . \end{array}$

The second derivative of the log likelihood function is

$\begin{array}{l} \frac{\partial^{2} \ln L (x, λ)}{\partial λ^{2}} = - \frac{2 r}{λ^{2}} - 2 \sum_{i = 1}^{n} x_{i}^{2} . \end{array}$

As an aside, the [latex]1 \times 1[/latex] Fisher information matrix

$\begin{array}{l} I (λ) = E [- \frac{\partial^{2} \ln L (x, λ)}{\partial λ^{2}}] = E [\frac{2 r}{λ^{2}} + 2 \sum_{i = 1}^{n} x_{i}^{2}] \end{array}$

cannot be calculated without knowing the probability distribution of the censoring times. The observed information matrix, however, can be calculated as

$\begin{array}{l} O (\hat{λ}) = {[- \frac{\partial^{2} \ln L (x, λ)}{\partial λ^{2}}]}_{λ = \hat{λ}} = \frac{2 r}{{\hat{λ}}^{2}} + 2 \sum_{i = 1}^{n} x_{i}^{2} = 4 \sum_{i = 1}^{n} x_{i}^{2} . \end{array}$

For large values of n, we know that

$\begin{array}{l} \hat{λ} \overset{a}{\sim} N (λ, O^{- 1} (\hat{λ})) \end{array}$

or

$\begin{array}{l} \hat{λ} \overset{a}{\sim} N (λ, {(4 \sum_{i = 1}^{n} x_{i}^{2})}^{- 1}) . \end{array}$

Standardizing by subtracting the population mean and dividing by the population standard deviation of [latex]\hat \lambda[/latex] gives

$\begin{array}{l} \frac{\hat{λ} - λ}{{(4 \sum_{i = 1}^{n} x_{i}^{2})}^{- 1 / 2}} \overset{a}{\sim} N (0, 1), \end{array}$

which implies that

$\begin{array}{l} lim_{n \to \infty} P (- z_{α / 2} < \frac{\hat{λ} - λ}{{(4 \sum_{i = 1}^{n} x_{i}^{2})}^{- 1 / 2}} < z_{α / 2}) = 1 - α . \end{array}$

Performing the algebra required to isolate λ in the center of the inequality results in an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ:

$\begin{array}{l} \hat{λ} - z_{α / 2} {(4 \sum_{i = 1}^{n} x_{i}^{2})}^{- 1 / 2} < λ < \hat{λ} + z_{α / 2} {(4 \sum_{i = 1}^{n} x_{i}^{2})}^{- 1 / 2} . \end{array}$

This confidence interval narrows as [latex]\sum_{i\,=\,1}^n x_i ^ {\kern 0.04em 2}[/latex] increases. So placing a large number of items on test with a lightly censored data set with [latex]r / n[/latex] close to one will result in a narrow confidence interval for λ.

To provide a numerical illustration, assume that the [latex]n = 5[/latex] items on a randomly right-censored life test with [latex]r = 3[/latex] observed failures illustrated in Figure 5.6 are

$\begin{array}{l} 1.3, 0.6, {1.6}^{*}, 1.9, {0.4}^{*}, \end{array}$

where the superscript * denotes a right-censored observation. For this data set,

$\begin{array}{l} \sum_{i = 1}^{n} x_{i}^{2} = {1.3}^{2} + {0.6}^{2} + {1.6}^{2} + {1.9}^{2} + {0.4}^{2} = 1.69 + 0.36 + 2.56 + 3.61 + 0.16 = 8.38 . \end{array}$

The maximum likelihood estimate of λ is

$\begin{array}{l} \hat{λ} = \sqrt{\frac{r}{\sum_{i = 1}^{n} x_{i}^{2}}} = \sqrt{\frac{3}{8.38}} = 0.598 . \end{array}$

An asymptotically exact two-sided 95% confidence interval for λ is

$\begin{array}{l} 0.598 - 1.96 {(4 \cdot 8.38)}^{- 1 / 2} < λ < 0.598 + 1.96 {(4 \cdot 8.38)}^{- 1 / 2} \end{array}$

or

$\begin{array}{l} 0.260 < λ < 0.937 . \end{array}$

To summarize the material introduced so far in this chapter, point estimators are statistics calculated from a data set to estimate an unknown parameter. Confidence intervals reflect the precision of a point estimator. The most common technique for determining a point estimator for an unknown parameter is maximum likelihood estimation, which involves finding the parameter value(s) that make the observed data values the most likely. The maximum likelihood estimators are usually found by using calculus to maximize the log likelihood function. Most population lifetime distributions do not have exact confidence intervals for unknown parameters, so the asymptotic properties of the likelihood function can be used to generate approximate confidence intervals for unknown parameters. Finally, many data sets in reliability are censored, which means that only a bound is known on the lifetime for one or more of the data values. The most common censoring mechanism is known as right censoring, where only a lower bound on the lifetime is known. The number of items on test is denoted by n and the number of observed failures is denoted by r.

The next section applies the techniques developed so far in this chapter to the exponential distribution.

5.4 Exponential Distribution

The exponential distribution is popular due to its tractability for parameter estimation and inference. The exponential distribution can be parameterized by either its population rate λ or its population mean [latex]\mu = 1 / \lambda[/latex]. Using the rate to parameterize the distribution, the survivor, density, hazard, and cumulative hazard functions are

$\begin{array}{l} S (t, λ) = e^{- λ t} f (t, λ) = λ e^{- λ t} h (t, λ) = λ H (t, λ) = λ t \end{array}$

for [latex]t \ge 0[/latex]. Note that the unknown parameter λ has been added as an argument in these lifetime distribution representations because it is now also an argument in the likelihood function and is estimated from data.

All the analysis in this and subsequent sections assumes that a random sample of n items from a population has been placed on a test and subjected to typical environmental conditions. Equivalently, [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are independent and identically distributed random lifetimes from a particular population distribution (exponential in this section). As with all statistical inference, care must be taken to ensure that a random sample of lifetimes is collected. Consequently, random numbers should be used to determine which n items to place on test. In a reliability setting, laboratory conditions should adequately mimic field conditions. Only representative items should be placed on test because items manufactured using a previous design may have a different failure pattern than those with the current design. This is more difficult in a biomedical setting because of inherent differences between patients.

Four classes of data sets (complete, Type II right censored, Type I right censored, and randomly right censored) are considered in separate subsections. In all cases, n is the number of items placed on test and r is the number of observed failures.

5.4.1 Complete Data Sets

A complete data set is typically the easiest to analyze because extensive analytical work exists for finding point and interval estimators for parameters. Also, by testing each item to failure, we have equal confidence in the fitted model in both the left-hand and right-hand tails of the distribution. A heavily right-censored data set, on the other hand, might fit well in the left-hand tail of the distribution where failures were observed, but we have less confidence in the right-hand tail of the distribution where there were few or no failures.

A complete data set consists of failure times [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex]. Although lowercase letters are used to denote the failure times here to be consistent with the notation for censoring times, the failure times are nonnegative random variables. The likelihood function can be written as a product of the probability density functions evaluated at the failure times:

$\begin{array}{l} L (λ) = \prod_{i = 1}^{n} f (t_{i}, λ) . \end{array}$

Note that the [latex]{\boldsymbol t}[/latex] argument has been left out of the likelihood expression for compactness. Using the last expression for the log likelihood function (adapted for a complete data set) from Section 5.3,

$\begin{array}{l} \ln L (λ) = \sum_{i = 1}^{n} [\ln h (t_{i}, λ) - H (t_{i}, λ)] . \end{array}$

For the exponential distribution, this is

$\begin{array}{l} \ln L (λ) = \sum_{i = 1}^{n} [\ln λ - λ t_{i}] = n \ln λ - λ \sum_{i = 1}^{n} t_{i} . \end{array}$

To determine the maximum likelihood estimator for λ, the single-element score vector

$\begin{array}{l} U (λ) = \frac{\partial \ln L (λ)}{\partial λ} = \frac{n}{λ} - \sum_{i = 1}^{n} t_{i}, \end{array}$

also known as the score statistic, is equated to zero and solved for λ, yielding

$\begin{array}{l} \hat{λ} = \frac{n}{\sum_{i = 1}^{n} t_{i}}, \end{array}$

where the denominator is often referred to as the total time on test. Not surprisingly, the maximum likelihood estimator [latex]\hat \lambda[/latex] is the reciprocal of the sample mean.

Example 5.5 A complete data set of [latex]n = 23[/latex] ball bearing failure times associated with testing the endurance of deep-groove ball bearings has been extensively studied. The failure times measured in 10⁶ revolutions, ordered for readability, are

$\begin{array}{l} 17.88 28.92 33.00 41.52 42.12 45.60 48.48 51.84 51.96 \\ 54.12 55.56 67.80 68.64 68.64 68.88 84.12 93.12 98.64 \\ 105.12 105.84 127.92 128.04 173.40 . \end{array}$

Notice that there is a single tied value of 68.64 million revolutions. Fit the exponential distribution to the [latex]n = 23[/latex] ball bearing failure times.

For this particular data set, the total time on test is [latex]\sum_{i\,=\,1}^{n} t_i = 1661.16[/latex] million revolutions, yielding the maximum likelihood estimate

$\begin{array}{l} \hat{λ} = \frac{n}{\sum_{i = 1}^{n} t_{i}} = \frac{23}{1661.16} = 0.01385 \end{array}$

failure per 10⁶ revolutions. The number of significant digits reported in the point estimate matches the number of digits in the data set. The value of the log likelihood function at the maximum likelihood estimate is [latex]\ln L \big( \hat{\lambda} \big) = -121.435[/latex], which will be used later in this chapter to compare the exponential and Weibull fits to this data set.

Figure 5.7 displays a graph of the empirical survivor function, which takes a downward step of [latex]{1 / n = 1 / 23}[/latex] at each data value, along with the fitted exponential survivor function [latex]S(t) = e ^ {- \hat \lambda t}[/latex]. Empirical and fitted distributions are traditionally compared by plotting the two the survivor functions on the same set of axes because the probability density function and hazard function suffer from the drawback of requiring the data to be divided into cells to plot the empirical distribution. It is apparent from this figure that the exponential distribution is a very poor fit. This particular data set was chosen for this example to illustrate one of the shortcomings of using the exponential distribution to model any data set without assessing the adequacy of the fit. Extreme caution must be exercised when using the exponential distribution since, as indicated in Figure 5.7, the exponential distribution is not an adequate probability model for this data set.

Figure 5.7: Empirical and exponential fitted survivor functions for the ball bearing data set.

Long Description for Figure 5.7

The horizontal axis t ranges from 0 to 200 in increments of 50 units. The vertical axis S of t ranges from 0.0 to 1.0 in increments of 0.2 units. The downward step function with 19 steps decreases rapidly from 1.0 to 0.5, whilst the time increases from 0 to 50 units. It then decreases gradually to 0.1, as the time increases to 150 units. The exponential function is superimposed over this step function, and decreases from 1.0 to 0.1 as the time increases from 0 to 150 units. The exponential function is plotted completely below the step function for the first 50 t values, and completely above it for the remaining t values. All data are estimated.

There are two clues that the exponential distribution would perform poorly in this setting. First, we neglected to plot a histogram of the ball bearing failure times prior to fitting the exponential distribution. The histogram in Figure 5.8 indicates a nonzero mode to the population probability density function, implying that the exponential distribution is probably not going to be an adequate model. Second, knowing the physics of failure can be helpful in this case. Ball bearings typically fail by wearing out. When a ball bearing’s diameter falls outside of a prescribed range, it is considered to be failed. This indicates that the hazard function for a ball bearing will probably increase over time, so a distribution with a monotone increasing hazard function from the IFR class would be a better choice than the exponential distribution. As shown in the next section, the Weibull distribution provides a much better approximation to this particular data set. Since the exponential distribution can be fitted to any data set that has at least one observed failure, the adequacy of the model must always be assessed. The point and interval estimators associated with the exponential distribution are legitimate only if the data set is a random sample drawn from an exponential population. That is almost certainly not the case for this particular data set.

Figure 5.8: Histogram of the ball bearing failure times.

Long Description for Figure 5.8

The horizontal axis ranges from 0 to 200 in increments of 50 units. The vertical axis ranges from 0 to 12 in increments of 2 units. The bar from 0 to 40 has mode 3, the bar from 40 to 80 has mode 12, the bar from 80 to 120 has mode 5, the bar from 120 to 160 has mode 2, and the bar from 160 to 200 has mode 1.

Information matrices. To find the information matrix associated with a complete data set from an exponential(λ) population, the derivative of the score statistic is required:

$\begin{array}{l} \frac{\partial^{2} \ln L (λ)}{\partial λ^{2}} = - \frac{n}{λ^{2}} . \end{array}$

Taking the expected value of the negative of this quantity yields the [latex]1 \times 1[/latex] Fisher information matrix

$\begin{array}{l} I (λ) = E [\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}] = E [\frac{n}{λ^{2}}] = \frac{n}{λ^{2}} . \end{array}$

If the maximum likelihood estimator [latex]\hat{\lambda}[/latex] is used as an argument in the negative of the second partial derivative of the log likelihood function, the [latex]1 \times 1[/latex] observed information matrix is obtained:

$\begin{array}{l} O (\hat{λ}) = {[\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}]}_{λ = \hat{λ}} = \frac{n}{{\hat{λ}}^{2}} = \frac{{(\sum_{i = 1}^{n} t_{i})}^{2}}{n} . \end{array}$

Confidence interval for λ. Asymptotic confidence intervals for λ based on the likelihood ratio statistic or the observed information matrix are unnecessary for a complete data set because the sampling distribution of [latex]\sum_{\,i\,=\,1}^{n} t_i[/latex] is tractable. In particular, from Theorem 4.5,

$\begin{array}{l} 2 λ \sum_{i = 1}^{n} t_{i} = \frac{2 n λ}{\hat{λ}} \end{array}$

has the chi-square distribution with 2n degrees of freedom. Therefore, with probability [latex]1-\alpha[/latex],

$\begin{array}{l} χ_{2 n, 1 - α / 2}^{2} < \frac{2 n λ}{\hat{λ}} < χ_{2 n, α / 2}^{2}, \end{array}$

where [latex]\chi_{2n, \, p}^2[/latex] is the ([latex]1 - p[/latex])th fractile of the chi-square distribution with 2n degrees of freedom. Performing the algebra required to isolate λ in the middle of the inequality yields an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ.

A well-known example of a randomly right-censored data set is drawn from the biostatistical literature. The focus here is on determining an estimate of the remission rate of a complete data set of remission times for patients in a control group.

Example 5.6 A clinical trial is conducted to determine the effect of an experimental drug named 6–mercaptopurine (6–MP) on leukemia remission times. A sample of [latex]n = 21[/latex] leukemia patients is treated with 6–MP, and the remission times are recorded. There are [latex]r = 9[/latex] individuals for whom the remission time is observed, and the remission times for the remaining 12 individuals are randomly censored on the right. Letting an asterisk denote a right-censored observation, the remission times (in weeks) are

$\begin{array}{l} 6 6 6 6^{*} 7 9^{*} 10 10^{*} 11^{*} 13 16 \\ 17^{*} 19^{*} 20^{*} 22 23 25^{*} 32^{*} 32^{*} 34^{*} 35^{*} . \end{array}$

In addition, 21 other leukemia patients are not given the drug, and they serve as a control group. For this group there is no censoring and the remission times are

$\begin{array}{l} 1 1 2 2 3 4 4 5 5 8 8 \\ 8 8 11 11 12 12 15 17 22 23. \end{array}$

This data set illustrates the simplest possible use of a covariate for modeling: a single binary covariate indicating the group to which each data value belongs. Fit the exponential distribution to the [latex]n = 21[/latex] remission times in the control group of the 6–MP clinical trial. Give a point estimator and a 95% confidence interval for λ.

Having learned our lesson from the previous example, we begin by drawing a histogram of the remission times, which is displayed in Figure 5.9. The shape of the histogram reveals significant random sampling variability which can be attributed to the small ([latex]n = 21[/latex]) number patients in the control group. Modeling the remission times with a probability distribution that has a mode of zero seems reasonable based on the shape of the histogram, so we will proceed with fitting the exponential distribution.

Figure 5.9: Histogram of the leukemia remission times.

Long Description for Figure 5.9

The horizontal axis ranges from 0 to 25 in increments of 5 units. The vertical axis ranges from 0 to 8 in increments of 2 units. The bar from 0 to 5 has mode 9, the bar from 5 to 10 has mode 4, the bar from 10 to 15 has mode 5, the bar from 15 to 20 has mode 1, and the bar from 20 to 25 has mode 2.

The total time on test is

$\begin{array}{l} \sum_{i = 1}^{21} t_{i} = 182 \end{array}$

weeks. The maximum likelihood estimate is

$\begin{array}{l} \hat{λ} = \frac{n}{\sum_{i = 1}^{n} t_{i}} = \frac{21}{182} = 0.12 \end{array}$

remission per week. Figure 5.10 shows the empirical survivor function, which takes a downward step of [latex]1 / n = 1 / 21[/latex] at each data point, along with the survivor function for the fitted exponential distribution. In spite of the discrete nature of the data, the excessive number of ties, and the fact that the number of patients in the control group is rather small, the exponential distribution does a reasonable job of approximating the empirical survivor function.

Figure 5.10: Empirical and exponential fitted survivor functions for the 6–MP control group.

Long Description for Figure 5.10

The horizontal axis t ranges from 0 to 20 in increments of 5 units. The vertical axis S of t ranges from 0.0 to 1.0 in increments of 0.2 units. The downward step function with 12 steps decreases rapidly from 1.0 to 0.6, whilst the time increases from 0 to 5 units. It then decreases progressively to 0.0, as the time increases to 20 units. The fitted exponential function is superimposed over this step function and decreases from 1.0 to 0.1 as time increases from 0 to 20 units. The exponential function lies along the empirical function. All data are estimated.

The observed information matrix is

$\begin{array}{l} O (\hat{λ}) = {[\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}]}_{λ = \hat{λ}} = \frac{{(\sum_{i = 1}^{n} t_{i})}^{2}}{n} = \frac{182^{2}}{21} = 1577. \end{array}$

Since the data set is complete, an exact two-sided 95% confidence interval for the failure rate of the distribution can be determined. Since [latex]\chi_{42, \, 0.975}^2 = 26.0[/latex] and [latex]\chi_{42, \, 0.025}^2 = 61.8[/latex], the formula for the confidence interval

$\begin{array}{l} \frac{\hat{λ} χ_{2 n, 1 - α / 2}^{2}}{2 n} < λ < \frac{\hat{λ} χ_{2 n, α / 2}^{2}}{2 n} \end{array}$

becomes

$\begin{array}{l} \frac{(0.12) (26.0)}{42} < λ < \frac{(0.12) (61.8)}{42} \end{array}$

or

$\begin{array}{l} 0.071 < λ < 0.17 . \end{array}$

The involvement of the non-symmetric chi-square distribution in this confidence interval means that the interval is not symmetric about the maximum likelihood estimate. For this and subsequent examples, intermediate calculations involving numeric quantities, such as critical values or total time on test values, are performed to as much precision as possible, then final values are reported using only significant digits.

The R code given below calculates the maximum likelihood estimator [latex]\hat \lambda[/latex], calculates the endpoints of the exact two-sided confidence interval for λ, and conducts the Kolmogorov–Smirnov goodness-of-fit test. [latex]p = 0.55[/latex].

Since the p-value for the Kolmogorov–Smirnov test is [latex]p = 0.55[/latex], there is not sufficient evidence in the data to reject the null hypothesis that the data values were drawn from an exponential population. This conclusion is consistent with the empirical and exponential fitted survivor functions in Figure 5.10. The exponential distribution provides a reasonable approximation to the leukemia remission times.

The importance of assessing model adequacy applies to all fitted distributions—not just the exponential distribution. Furthermore, if a modeler knows the failure physics (for example, fatigue crack growth) underlying a process, then an appropriate probability model that is consistent with the failure physics should be chosen.

So far we have fitted the exponential distribution to two complete data sets: the ball bearing failure times from Example 5.5 and the 6–MP remission times for the control group from Example 5.6. We visually assessed the two fits in Figures 5.7 and 5.10 by comparing the empirical survivor function, which takes a downward step of [latex]1 / n[/latex] at each data value, with the fitted survivor function [latex]S(t) = e ^ {- \hat \lambda \kern 0.02em t}[/latex] and concluded that the exponential distribution did a very poor job of approximating the ball bearing failure times and a (barely) adequate job of approximating the remission times of the patients in the control group of the 6–MP clinical trial. This visual assessment was subjective and was followed by a formal goodness-of-fit test in order to draw these conclusions for the 6–MP remission times.

Confidence intervals for measures other than λ. It is possible to find point and interval estimators for measures other than λ by using the invariance property for maximum likelihood estimators and by rearranging the confidence interval formula. Define

$\begin{array}{l} L = \frac{\hat{λ} χ_{2 n, 1 - α / 2}^{2}}{2 n} and U = \frac{\hat{λ} χ_{2 n, α / 2}^{2}}{2 n} \end{array}$

as the lower and upper bounds on the exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ. If the measure of interest is [latex]\mu = 1 / \lambda[/latex], for example, then the point estimator is the sample mean [latex]{\hat{\mu} = \frac{1}{n} \sum_{i\,=\,1}^{n} t_i}[/latex]. Rearranging the confidence interval

$\begin{array}{l} L < λ < U \end{array}$

by taking reciprocals yields the exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for μ:

$\begin{array}{l} \frac{1}{U} < μ < \frac{1}{L} . \end{array}$

As a second example, consider the probability of survival to a fixed time t, [latex]S(t)= e^{-\lambda t}[/latex]. By the invariance property of maximum likelihood estimators, the maximum likelihood estimator for the survivor function at time t is

$\begin{array}{l} \hat{S} (t) = e^{- \hat{λ} t} . \end{array}$

A confidence interval for [latex]S(t)[/latex], on the other hand, can be found by rearranging the confidence interval

$\begin{array}{l} L < λ < U \end{array}$

in the following fashion:

$\begin{array}{l} - U < - λ < - L \end{array}$

$\begin{array}{l} e^{- U t} < e^{- λ t} < e^{- L t} \end{array}$

$\begin{array}{l} e^{- U t} < S (t) < e^{- L t} . \end{array}$

These formulas for point and interval estimates for quantities other than λ are illustrated next for a complete data set that is assumed to be drawn from an exponential population.

Example 5.7 Assuming that the exponential distribution is an appropriate model for the remission times in the control group of the 6–MP clinical trial, find point estimators and exact two-sided 95% confidence intervals for the mean remission time and the probability that a patient in the control group has a remission time that exceeds 10 weeks.

The point estimators in this case are

$\begin{array}{l} \hat{μ} = \frac{1}{n} \sum_{i = 1}^{n} t_{i} = \frac{182}{21} = 8.7 \end{array}$

weeks and

$\begin{array}{l} \hat{S} (t) = e^{- \hat{λ} t}, \end{array}$

which is [latex]\hat{S}(10) = e^{-(0.12)(10)} = 0.32[/latex]. The values of L and U for the exact two-sided 95% confidence interval for λ from the previous example are [latex]L = 0.071[/latex] and [latex]U = 0.17[/latex]. Finding a confidence interval for the population mean requires taking reciprocals of these limits:

$\begin{array}{l} \frac{1}{0.17} < μ < \frac{1}{0.071} \end{array}$

$\begin{array}{l} 5.9 < μ < 14. \end{array}$

An exact two-sided 95% confidence interval for [latex]S(100)[/latex] using the formula

$\begin{array}{l} e^{- U t} < S (t) < e^{- L t} \end{array}$

results in

$\begin{array}{l} e^{- (0.17) (10)} < S (100) < e^{- (0.071) (10)} \end{array}$

or

$\begin{array}{l} 0.18 < S (10) < 0.49 . \end{array}$

Although the manipulation of the confidence interval for λ is performed here in the case of a complete data set, these techniques may also be applied to any of the right-censoring mechanisms to be described in the next three subsections.

5.4.2 Type II Censored Data Sets

A life test of n items that is terminated when r failures have occurred produces a Type II right-censored data set. The previous subsection is a special case of Type II censoring when [latex]r = n[/latex]. As before, assume that the failure times are [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex], the test is terminated upon the rth ordered failure, the censoring times are [latex]c_1 = c_2 = \cdots = c_n = t_{(r)}[/latex] for all items, and [latex]x_i = \min \{t_i, \, c_i \}[/latex] for [latex]i = 1, \, 2, \, \ldots, \, n[/latex].

Since [latex]h(t, \, \lambda ) = \lambda[/latex] and [latex]H(t, \, \lambda ) = \lambda t[/latex] for [latex]t \ge 0[/latex], the log likelihood function is

$\begin{array}{l} \ln L (λ) = \sum_{i \in U} \ln h (x_{i}, λ) - \sum_{i = 1}^{n} H (x_{i}, λ) = r \ln λ - λ \sum_{i = 1}^{n} x_{i} \end{array}$

because there are r observed failures. The expression

$\begin{array}{l} \sum_{i = 1}^{n} x_{i} = \sum_{i \in U} t_{i} + \sum_{i \in C} c_{i} = \sum_{i = 1}^{r} t_{(i)} + (n - r) t_{(r)}, \end{array}$

where [latex]t_{(1)} < t_{(2)} < \cdots < t_{(r)}[/latex] are the order statistics of the observed failure times, is the total time on test. It represents the total accumulated time that the n items accrue while on test.

To determine the maximum likelihood estimator, the log likelihood function is differentiated with respect to λ,

$\begin{array}{l} U (λ) = \frac{\partial \ln L (λ)}{\partial λ} = \frac{r}{λ} - \sum_{i = 1}^{n} x_{i} \end{array}$

and is equated to zero, yielding the maximum likelihood estimator.

So the maximum likelihood estimator of the failure rate is the ratio of the number of observed failures to the total time on test. The second partial derivative of the log likelihood function is

$\begin{array}{l} \frac{\partial^{2} \ln L (λ)}{\partial λ^{2}} = - \frac{r}{λ^{2}}, \end{array}$

so the information matrix is

$\begin{array}{l} I (λ) = E [\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}] = \frac{r}{λ^{2}}, \end{array}$

and the observed information matrix is

$\begin{array}{l} O (\hat{λ}) = {[\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}]}_{λ = \hat{λ}} = \frac{r}{{\hat{λ}}^{2}} = \frac{{(\sum_{i = 1}^{n} x_{i})}^{2}}{r} . \end{array}$

Exact confidence intervals and hypothesis tests concerning λ can be derived by using the result

$\begin{array}{l} 2 λ \sum_{i = 1}^{n} x_{i} = \frac{2 r λ}{\hat{λ}} \sim χ^{2} (2 r), \end{array}$

where [latex]\chi^2 (2r)[/latex] is the chi-square distribution with 2r degrees of freedom. This result can be proved in a similar fashion to Theorem 4.5 of the exponential distribution from Section 4.2. Using this fact, an exact two-sided confidence interval for λ can be constructed in a similar fashion to that for a complete data set. It can be stated with probability [latex]1 - \alpha[/latex] that

$\begin{array}{l} χ_{2 r, 1 - α / 2}^{2} < \frac{2 r λ}{\hat{λ}} < χ_{2 r, α / 2}^{2} . \end{array}$

Rearranging terms yields an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for the failure rate λ.

Example 5.8 A Type II right-censored data set of [latex]n = 15[/latex] automotive a/c switches has been collected. The test was terminated when the fifth failure occurred. The [latex]r = 5[/latex] ordered observed failure times measured in number of cycles are

$\begin{array}{l} t_{(1)} = 1410, t_{(2)} = 1872, t_{(3)} = 3138, t_{(4)} = 4218, t_{(5)} = 6971. \end{array}$

The remaining 10 automotive a/c switches are right-censored at 6971 cycles. Any parametric model that is fitted to this data set is only considered valid from 0 to 6971 cycles unless there is some evidence (perhaps from previous test results) that indicates that the parametric model is valid beyond 6971 cycles. Fit the exponential distribution to this data set and give point and interval estimates for the failure rate and the mean time to failure.

A diagram that can be helpful in visualizing lifetime data sets is given in Figure 5.11. The top five horizontal lines ending with × denote the [latex]r = 5[/latex] observed failures and the bottom [latex]n - r = 15 - 5 = 10[/latex] horizontal lines ending with ∘ denote the right-censored observations at 6971 cycles. Each of the right-censored observations will have an unseen × somewhere to the right of the censoring time indicated by the [latex]\circ[/latex]. It is a worthwhile thought experiment to imagine where those ×s might occur for each right-censored observation in this particular data set. Once you have visualized the approximate positions of the ten right-censored failure times, try to guess the approximate population mean of the 15 failure times.

Figure 5.11: Automotive switches failure and censoring times with [latex]n = 5[/latex] and [latex]r = 3[/latex].

Long Description for Figure 5.11

The horizontal axis measures time t and ranges from 0 to 6971. The vertical axis represents the number of items 1 through 15 from the top to the bottom. All horizontal line segments begin at the same time t equals 0. Horizontal line segments of different lengths are marked with X marks or circles at the other end. The X marks indicate the failure times and the circles indicate censoring times. Item 1 has a failure time with a value of 1410, item 2 has a failure item with a value of 1872, item 3 has a failure time with a value of 3138, and item 4 has a failure time with a value of 4218. Items 5 through 15, in the order, have equal and maximum lifetimes. Among, only item 5 is failure one, and the other are right censored. A dotted vertical line is drawn at t equals 6971.

For this particular data set, the total time on test is [latex]\sum_{i\,=\,1}^{n} x_i = 87,319[/latex] cycles, yielding a maximum likelihood estimate

$\begin{array}{l} \hat{λ} = \frac{r}{\sum_{i = 1}^{n} x_{i}} = \frac{5}{87, 319} = 0.00005726 \end{array}$

failure per cycle. Equivalently, the maximum likelihood estimate of the population mean time to failure is

$\begin{array}{l} \hat{μ} = \frac{\sum_{i = 1}^{n} x_{i}}{r} = \frac{87, 319}{5} = 17, 464 \end{array}$

cycles. Notice that the estimated mean time to failure exceeds the largest observed failure time, [latex]t_{(5)} = 6971[/latex]. As long as there is evidence, perhaps from previous testing on identical or similar automotive switches, to support the exponential failure time distribution, this estimate of the population mean time to failure is meaningful. Figure 5.12 shows the empirical survivor function (which takes downward steps of [latex]1/n = 1 / 15[/latex] at each of the five observed failure times) and the associated fitted exponential survivor function. In this case, the exponential distribution appears to adequately model the lifetimes through 6971 cycles. The confidence intervals given below are only exact when the data values are drawn from an exponential population, so assessing the fit is a crucial part of data analysis. Assessing the adequacy of the fit is more difficult for a right-censored data set because it is impossible to determine what the lifetime distribution looks like after the last observed failure time (6971 cycles for this data set) unless previous test results support the exponential model.

Figure 5.12: Empirical and exponential fitted survivor functions for the a/c switch data set.

Long Description for Figure 5.12

The horizontal axis t ranges from 0 to 7000 in increments of 2000 units. The vertical axis S of t ranges from 0.0 to 1.0 in increments of 0.2 units. The downward step function with 5 steps decreases rapidly from 1.0 to 0.7, whilst the time increases from 0 to 7000 units. A line with negative slope is superimposed over the step function. It decreases from (0, 1.0) through (4000, 0.8) to (7000, 0.7). All data are estimated.

The observed information matrix based on using the failure rate as the unknown parameter is

$\begin{array}{l} O (\hat{λ}) = {[\frac{- \partial^{2} \ln L (λ)}{\partial λ^{2}}]}_{λ = \hat{λ}} = \frac{{(\sum_{i = 1}^{n} x_{i})}^{2}}{r} = \frac{(87, 319)^{2}}{5} = 1, 525, 000, 000. \end{array}$

Since the data set is Type II right censored, an exact two-sided 95% confidence interval for the failure rate of the distribution can be determined. Using the chi-square critical values, [latex]\chi_{10, \, 0.975} ^ 2 = 3.247[/latex] and [latex]\chi_{10, \, 0.025} ^ 2 = 20.49[/latex], the formula for the confidence interval

$\begin{array}{l} \frac{\hat{λ} χ_{2 r, 1 - α / 2}^{2}}{2 r} < λ < \frac{\hat{λ} χ_{2 r, α / 2}^{2}}{2 r} \end{array}$

becomes

$\begin{array}{l} \frac{(0.00005726) (3.247)}{10} < λ < \frac{(0.00005726) (20.49)}{10} \end{array}$

or

$\begin{array}{l} 0.00001859 < λ < 0.0001173 . \end{array}$

Taking reciprocals, this is equivalent to an exact two-sided 95% confidence interval for the population mean number of cycles to failure of

$\begin{array}{l} 8526 < μ < 53, 785. \end{array}$

Not surprisingly, with only [latex]r = 5[/latex] observed failures, this is a rather wide confidence interval for μ, and hence there is not as much precision as in the case of the 6–MP control group data, in which there were [latex]n = r = 21[/latex] observed remission times. The R code below calculates the point estimates for λ and μ and the associated exact two-sided 95% confidence intervals.

Hypothesis testing, which is the rough equivalent of interval estimation, is also possible in the case of Type II censoring because the sampling distribution of [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i[/latex] is tractable. Some aspects of hypothesis testing in the setting of Type II censoring, such as the alternative hypothesis, one- and two-tailed tests, and p-values are illustrated in the next example. The example shows how a life test can be used to check a manufacturer’s claimed mean time to failure.

Example 5.9 The producer of the automotive switches tested in the previous example claims that the population mean time to failure of their switches is [latex]\mu = 100,000[/latex] cycles. Is there enough evidence in the data set of 15 switches placed on test to conclude that the population mean time to failure is less than 100,000 cycles? Assume that the automotive switch lifetimes are exponentially distributed.

The producer’s claim is certainly suspect because the maximum likelihood estimator for the population mean time to failure is only [latex]\hat{\mu} = 17,464[/latex] from the previous example. The null and alternative hypotheses for the hypothesis test are

$\begin{array}{l} H_{0} : & μ = 100, 000 \\ H_{1} : & μ < 100, 000 \end{array}$

or, equivalently,

$\begin{array}{l} H_{0} : & λ = 0.00001 \\ H_{1} : & λ > 0.00001 \end{array}$

in terms of the failure rate. So the hypothesis test being conducted here is to determine whether there is statistically significant evidence in the data set to conclude that the population mean time to failure of the switches is less then 100,000 cycles. Since small values of [latex]\sum_{\,i\,=\,1}^{n} x_i[/latex] lead to rejecting H₀, the attained level of significance (p-value) is

$\begin{array}{l} p = P (\sum_{i = 1}^{n} x_{i} < 87, 319 | λ = 0.00001) . \end{array}$

Since [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i \sim \chi^2 (2r)[/latex], the p-value, when H₀ is true, is

$\begin{array}{l} p & = P ((2) (0.00001) \sum_{i = 1}^{n} x_{i} < (2) (0.00001) (87, 319)) \\ = P (χ^{2} (10) < 1.746) \\ = 0.002 . \end{array}$

This p-value can be calculated with the following R statements.

Although the number of observed failures is small, there is adequate evidence from this data set to conclude that the population mean number of cycles to failure is less than 100,000 (for example, the null hypothesis can be rejected at significance levels [latex]\alpha = 0.10[/latex], 0.05, and 0.01). We conclude that the manufacturer is probably exaggerating the magnitude of the population mean time to failure based on this hypothesis test.

The fact that the distribution of [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i = {2r \lambda} / {\hat{\lambda}}[/latex] is independent of n implies that [latex]\hat{\lambda}[/latex] has the same precision in a test of r items tested until all have failed as that for a test of n items tested until r items have failed. So the justification for obtaining a Type II censored data set over a complete data set is time savings. The additional costs associated with this time savings are the additional [latex]n-r[/latex] test stands and the additional [latex]n-r[/latex] items to place on test.

If a limited number of test stands are available for testing, the only way to speed up the test is to perform a test with replacement in which failed items are immediately replaced with new items. This will decrease the expected time to complete the test, which is terminated when r of the items fail. The sequence of failures in this case is a Poisson process with rate [latex]n \lambda[/latex].

Although the inference for Type II censoring is tractable, the unfortunate consequence is that the time to complete the test is a random variable. Constraints on the time to run a life test may make a Type I censored data set more practical.

5.4.3 Type I Censored Data Sets

The analysis for Type I censored data sets is similar to that for the Type II censoring case. The test is terminated at time c. The censoring times for each item on test are the same: [latex]c_1 = c_2 = \cdots= c_n= c[/latex]. The number of observed failures, r, is a random variable. The total time on test in this case is

$\begin{array}{l} \sum_{i = 1}^{n} x_{i} = \sum_{i \in U} t_{i} + \sum_{i \in C} c_{i} = \sum_{i = 1}^{r} t_{(i)} + (n - r) c . \end{array}$

As before, the log likelihood function is

$\begin{array}{l} \ln L (λ) = \sum_{i \in U} \ln h (x_{i}, λ) - \sum_{i = 1}^{n} H (x_{i}, λ) = r \ln λ - λ \sum_{i = 1}^{n} x_{i}, \end{array}$

and the score statistic is

$\begin{array}{l} U (λ) = \frac{r}{λ} - \sum_{i = 1}^{n} x_{i} . \end{array}$

The maximum likelihood estimator for [latex]r > 0[/latex] is

$\begin{array}{l} \hat{λ} = \frac{r}{\sum_{i = 1}^{n} x_{i}}, \end{array}$

the information matrix is

$\begin{array}{l} I (λ) = \frac{r}{λ^{2}}, \end{array}$

and the observed information matrix is

$\begin{array}{l} O (\hat{λ}) = \frac{r}{{\hat{λ}}^{2}} . \end{array}$

The functional form of the maximum likelihood estimator is identical to the Type II censoring case. For identical values of r, Type I censoring has a larger total time on test [latex]\sum_{\,i\,=\,1}^{n} x_i[/latex] than the corresponding Type II censoring case because a Type I test ends between failures r and [latex]r + 1[/latex]. Thus the expected value of [latex]\hat{\lambda}[/latex] is smaller for Type I censoring than for Type II censoring. One problem that arises with Type I censoring is that the sampling distribution of [latex]\sum_{\,i\,=\,1}^{n} x_i[/latex] is no longer tractable, so an exact confidence interval for λ has not been established. Although many more complicated methods exist, one of the best approximation methods is to assume that [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i[/latex] has the chi-square distribution with [latex]2r+1[/latex] degrees of freedom. This approximation, illustrated in Figure 5.13, is based on the fact that if [latex]c = t_{(r)}[/latex], then [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i \sim \chi^2 (2r)[/latex], and if [latex]c = t_{{(r}+1)}[/latex], then [latex]2 \lambda \sum_{{\,i\,=\,1}}^{n} x_i \sim \chi^2 (2r+2)[/latex]. Since c is between [latex]t_{(r)}[/latex] and [latex]t_{(r + 1)}[/latex], [latex]2 \lambda \sum_{{\,i\,=\,1}}^{n} x_i[/latex] will be approximately chi-square with [latex]2r+1[/latex] degrees of freedom. This constitutes a proof, after a little algebra, of the following result.

A horizontal line t ranges from 0 and has marks t of 1, t of 2, t of 3, and so on t of r, and t of r plus 1 at irregular intervals. There is an axis break between t of 3 and t of r and to the right of t of r plus 1. The tick mark c is located between t of r and t of r plus 1. — Figure 5.13: Approximation technique for confidence intervals for Type I censoring.

Example 5.10 A life test of [latex]n = 100[/latex] light bulbs is run for [latex]c = 5000[/latex] hours. Failed items are not replaced upon failure in this Type I right censored data set. If the total time on test is [latex]\sum_{\,i\,=\,1}^{n} x_i = 384,968[/latex] hours, and [latex]r = 32[/latex] failures are observed, find a point and interval estimator for the failure rate.

It is impossible to check to see whether the exponential distribution is an appropriate model for the light bulb failure times from the problem statement because the actual failure times are not given. Assuming that the exponential model is appropriate, the maximum likelihood estimate for the failure rate is

$\begin{array}{l} \hat{λ} = \frac{r}{\sum_{i = 1}^{n} x_{i}} = \frac{32}{384, 968} = 0.0000831 \end{array}$

failure per hour, or, equivalently, the maximum likelihood estimate for the population mean time to failure is its reciprocal, 12,030 hours. To obtain an approximate 95% confidence interval for the failure rate, the chi-square critical values for [latex]2r + 1 = 65[/latex] degrees of freedom must be determined. These critical values are [latex]\chi_{65, \, 0.975}^2 = 44.60[/latex] and [latex]\chi_{65, \, 0.025}^2 = 89.18[/latex]. The approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ

$\begin{array}{l} \frac{\hat{λ} χ_{2 r + 1, 1 - α / 2}^{2}}{2 r} < λ < \frac{\hat{λ} χ_{2 r + 1, α / 2}^{2}}{2 r} \end{array}$

becomes

$\begin{array}{l} \frac{(0.0000831) (44.60)}{64} < λ < \frac{(0.0000831) (89.18)}{64} \end{array}$

or

$\begin{array}{l} 0.0000579 < λ < 0.000116 . \end{array}$

Taking reciprocals, this is equivalent to an approximate 95% confidence interval for the population mean number of cycles to failure of

$\begin{array}{l} 8630 < μ < 17, 260. \end{array}$

The R statements below compute the point and interval estimates.

5.4.4 Randomly Censored Data Sets

Many of the examples that have the random censoring mechanism for which the failure times [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] and the censoring times [latex]c_1, \, c_2, \, \ldots, \, c_n[/latex] are independent random variables are from biostatistics. Random censoring occurs frequently in biostatistics because it is not always possible to control the time patients enter and exit the study. The log likelihood function, score statistic, information matrix, and observed information matrix are the same as in the Type I censoring case. The total time on test is now simply

$\begin{array}{l} \sum_{i = 1}^{n} x_{i} = \sum_{i \in U} t_{i} + \sum_{i \in C} c_{i} . \end{array}$

The sampling distribution of [latex]\sum_{\,i\,=\,1}^{n} x_i[/latex] is more complicated in this case, so asymptotic properties must be relied on to determine approximate confidence intervals for λ. In the example that follows, three different approximation procedures for determining a confidence interval for λ are illustrated.

The first technique is based on an approximation to a result that holds exactly in the Type II censoring case: [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i \sim \chi^2 (2r)[/latex]. The second technique is based on the likelihood ratio statistic, where [latex]-2 [\ln \, L(\lambda) - \ln \, L( \hat \lambda )][/latex] is asymptotically chi-square with 1 degree of freedom. The third technique is based on the fact that the maximum likelihood estimator [latex]\hat{\lambda}[/latex] is asymptotically normal with population mean λ and a population variance that is the inverse of the observed information matrix. Since this third technique results in a symmetric confidence interval, it should only be used with large sample sizes.

Example 5.11 Find the maximum likelihood estimate and three approximate 95% confidence intervals for the remission rate λ for the treatment group (those who received the drug 6–MP) in the leukemia study described in Example 5.6.

For this data set, there are [latex]n = 21[/latex] individuals on test and [latex]r = 9[/latex] observed failures. A “failure” for this data set is the end of a remission period. The total time on test for this data set is [latex]\sum_{\,i\,=\,1}^{n} x_i = 359[/latex] weeks. The log likelihood function is

$\begin{array}{l} \ln L (λ) = r \ln λ - λ \sum_{i = 1}^{n} x_{i} = 9 \ln λ - 359 λ . \end{array}$

As shown by the vertical dashed line in Figure 5.14, this function is maximized at

$\begin{array}{l} \hat{λ} = \frac{r}{\sum_{i = 1}^{n} x_{i}} = \frac{9}{359} = 0.0251 \end{array}$

Figure 5.14: Log likelihood function for the 6–MP treatment group.

Long Description for Figure 5.14

The horizontal axis measures lambda and ranges from 0 to 0.06 in increments of 0.01 units. The vertical axis measures the log likelihood function L of lambda and ranges from negative 51 to negative 42 in increments of 1 unit. The curve begins at (0.002, negative 51), increases to a peak at (0.025, negative 42.17), and then decreases to (0.06, negative 45). A dashed horizontal line and a dashed vertical line are drawn through the maximum point. All data are estimated.

remission per week. The maximum likelihood estimate of the expected remission time is [latex]\hat{\mu} = 359 / 9 = 39.9[/latex] weeks. The value of the log likelihood function at the maximum likelihood estimate is [latex]\ln L (\hat{\lambda}) = -42.17[/latex], as indicated by the horizontal dashed line in Figure 5.14. The observed information matrix is

$\begin{array}{l} O (\hat{λ}) = \frac{{(\sum_{i = 1}^{n} x_{i})}^{2}}{r} = \frac{(359)^{2}}{9} = 14, 320. \end{array}$

The three approximation techniques for determining a confidence interval for λ are outlined next. Under the assumption that [latex]2 \lambda \sum_{\,i\,=\,1}^{n} x_i[/latex] is approximately chi-square with 2r degrees of freedom (this is satisfied exactly in the Type II censoring case), an approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for λ is

$\begin{array}{l} \frac{\hat{λ} χ_{2 r, 1 - α / 2}^{2}}{2 r} < λ < \frac{\hat{λ} χ_{2 r, α / 2}^{2}}{2 r}, \end{array}$

which, for the 6–MP treatment group remission times with [latex]\alpha = 0.05[/latex], is

$\begin{array}{l} \frac{(9) (8.23)}{(359) (18)} < λ < \frac{(9) (31.53)}{(359) (18)} \end{array}$

because [latex]\chi_{18, \, 0.975}^2 = 8.23[/latex] and [latex]\chi_{18, \, 0.025}^2 = 31.53[/latex], or

$\begin{array}{l} 0.0115 < λ < 0.0439 . \end{array}$

The second approximate confidence interval for λ is based on the likelihood ratio statistic, [latex]-2[ \ln \, L(\lambda) - \ln L(\hat \lambda)][/latex], which is asymptotically chi-square with 1 degree of freedom. Thus, with probability [latex]1 - \alpha[/latex], the inequality

$\begin{array}{l} - 2 [\ln L (λ) - \ln L (\hat{λ})] < χ_{1, α}^{2} \end{array}$

is approximately satisfied. For the 6–MP remission times in the treatment group and [latex]\alpha = 0.05[/latex], this can be rearranged as

$\begin{array}{l} \ln L (λ) > \ln L (\hat{λ}) - \frac{3.84}{2} \end{array}$

because [latex]\chi_{1, \, 0.05}^2 = 3.84[/latex], or

$\begin{array}{l} \ln L (λ) > - 42.17 - \frac{3.84}{2} . \end{array}$

As shown by the horizontal dashed lines in Figure 5.15, this corresponds to all values of λ for which the log likelihood function is within [latex]3.84/2 = 1.92[/latex] units of its largest value. The inequality reduces to

$\begin{array}{l} 9 \ln λ - 359 λ > - 42.17 - 1.92, \end{array}$

Figure 5.15: Log likelihood function and 95% confidence limits for λ for the 6–MP treatment group.

Long Description for Figure 5.15

The horizontal axis measures lambda and ranges from 0 to 0.05 in increments of 0.01 units. The vertical axis measures the log likelihood function L of lambda and ranges from negative 51 to negative 42 in increments of 1 unit. The curve begins at (0.002, negative 51), increases to a peak at (0.025, negative 42.17), and then decreases to (0.06, negative 45). Two dashed horizontal lines are drawn for L of lambda at negative 42.17 and negative 44.09. Two dashed vertical lines are drawn from the lambda values of 0.0115 and 0.0439. All data are estimated.

which can be solved numerically to determine the endpoints. Many computer languages have an equation solver that can determine the two λ values satisfying

$\begin{array}{l} 9 \ln λ - 359 λ = - 42.17 - 1.92 = - 44.09 . \end{array}$

In this particular example, the approximate two-sided confidence interval for λ is

$\begin{array}{l} 0.0120 < λ < 0.0452, \end{array}$

which is shifted slightly to the right of the previous confidence interval. The lower and upper bounds for this confidence interval are indicated by the vertical dashed lines in Figure 5.15.

The final confidence interval for λ is based on the fact that the sampling distribution of [latex]\hat{\lambda}[/latex] is asymptotically normal with population mean λ and population variance [latex]I (\lambda)^{-1}[/latex]. Replacing [latex]I ( \lambda[/latex]) by the observed information matrix [latex]O ( \hat{\lambda})[/latex], with approximate probability [latex]1 - \alpha[/latex],

$\begin{array}{l} - z_{α / 2} < \frac{\hat{λ} - λ}{O (\hat{λ})^{- 1 / 2}} < z_{α / 2}, \end{array}$

where [latex]z_{\alpha / 2}[/latex] is the [latex]1 - \alpha / 2[/latex] fractile of the standard normal distribution. This is equivalent to

$\begin{array}{l} \hat{λ} - z_{α / 2} O (\hat{λ})^{- 1 / 2} < λ < \hat{λ} + z_{α / 2} O (\hat{λ})^{- 1 / 2} . \end{array}$

For the 6–MP treatment group remission times, an approximate two-sided 95% confidence interval for λ is

$\begin{array}{l} \frac{9}{359} - (1.96) (14, 320)^{- 1 / 2} < λ < \frac{9}{359} + (1.96) (14, 320)^{- 1 / 2} \end{array}$

or

$\begin{array}{l} 0.0087 < λ < 0.0414, \end{array}$

which has smaller bounds than the previous two interval estimators.

To summarize the conclusions of this long example, the maximum likelihood estimate of the failure rate is

$\begin{array}{l} \hat{λ} = 0.0251 \end{array}$

remission per week, which corresponds to an estimated mean remission time of 39.9 weeks. The three approximate two-sided 95% confidence intervals for λ are given in the second column of Table 5.1. Taking reciprocals, the third column contains the associated approximate two-sided 95% confidence intervals for the population mean remission time μ. The confidence intervals for λ associated with the first two techniques are not symmetric about the maximum likelihood estimator because they are based on the non-symmetric chi-square distribution. Since there are only [latex]n = 21[/latex] patients in the clinical trial and only [latex]r = 9[/latex] observed remission times, we have more faith in the actual coverage of the first two confidence interval techniques. This conclusion would need to be confirmed by a Monte Carlo simulation experiment.

Table 5.1: Approximate 95% confidence intervals for λ and μ for the 6–MP treatment group.
Basis for confidence interval	Confidence interval for λ	Confidence interval for μ
Type II censoring approximate result	[latex]0.0115 < \lambda < 0.0439[/latex]	[latex]22.8 < \mu < 87.2[/latex]
Likelihood ratio statistic	[latex]0.0120 < \lambda < 0.0452[/latex]	[latex]22.1 < \mu < 83.0[/latex]
Asymptotic normality of the MLE	[latex]0.0087 < \lambda < 0.0414[/latex]	[latex]24.1 < \mu < 115.1[/latex]

To summarize, the maximum likelihood estimator for the failure rate λ in the random censoring case is the same as in the complete, Type II and Type I censoring cases:

$\begin{array}{l} \hat{λ} = \frac{r}{\sum_{i = 1}^{n} x_{i}} . \end{array}$

Three approximate confidence intervals for λ are based on (a) an exact result from Type II censoring, (b) the asymptotic distribution of the likelihood ratio statistic, and (c) the asymptotic normality of the maximum likelihood estimator. The confidence interval based on the asymptotic normality of the maximum likelihood estimator is symmetric and is therefore recommended only in the case of a large number of items on test.

5.5 Weibull Distribution

The Weibull distribution is typically more appropriate for modeling the lifetimes of items with a strictly increasing or decreasing hazard function, such as mechanical items. Rather than looking at each censoring mechanism (for example, no censoring, Type II censoring, Type I censoring) individually, we proceed directly to the general case of random censoring.

Maximum likelihood estimators. As before, let [latex]t_1 , \, t_2 , \, \ldots , \, t_n[/latex]be the failure times, [latex]c_1 , \, c_2 , \, \ldots , \, c_n[/latex] be the associated censoring times, and [latex]{x_i = \min \{t_i, \, c_i \}}[/latex] for [latex]i= 1, \, 2, \, \ldots, \, n[/latex]. The Weibull distribution has hazard and cumulative hazard functions

$\begin{array}{l} h (t, λ, κ) = κ λ (λ t)^{κ - 1} t \geq 0 \end{array}$

and

$\begin{array}{l} H (t, λ, κ) = (λ t)^{κ} t \geq 0. \end{array}$

When there are r observed failures, the log likelihood function is

$\begin{array}{l} \ln L (λ, κ) & = \sum_{i \in U} \ln h (x_{i}, λ, κ) - \sum_{i = 1}^{n} H (x_{i}, λ, κ) \\ = \sum_{i \in U} (\ln κ + κ \ln λ + (κ - 1) \ln x_{i}) - \sum_{i = 1}^{n} (λ x_{i})^{κ} \\ = r \ln κ + κ r \ln λ + (κ - 1) \sum_{i \in U} \ln x_{i} - λ^{κ} \sum_{i = 1}^{n} x_{i}^{κ}, \end{array}$

and the [latex]2 \times 1[/latex] score vector has elements

$\begin{array}{l} U_{1} (λ, κ) = \frac{\partial \ln L (λ, κ)}{\partial λ} = \frac{κ r}{λ} - κ λ^{κ - 1} \sum_{i = 1}^{n} x_{i}^{κ} \end{array}$

and

$\begin{array}{l} U_{2} (λ, κ) = \frac{\partial \ln L (λ, κ)}{\partial κ} = \frac{r}{κ} + r \ln λ + \sum_{i \in U} \ln x_{i} - \sum_{i = 1}^{n} (λ x_{i})^{κ} \ln (λ x_{i}) . \end{array}$

When these equations are set equal to zero, the simultaneous equations have no closed-form solution for [latex]\hat{\lambda}[/latex] and [latex]\hat \kappa[/latex]:

$\begin{array}{l} \frac{κ r}{λ} - κ λ^{κ - 1} \sum_{i = 1}^{n} x_{i}^{κ} = 0, \end{array}$

$\begin{array}{l} \frac{r}{κ} + r \ln λ + \sum_{i \in U} \ln x_{i} - \sum_{i = 1}^{n} (λ x_{i})^{κ} \ln (λ x_{i}) = 0. \end{array}$

One piece of good fortune, however, to avoid solving a [latex]2 \times 2[/latex] set of nonlinear equations, is that the first equation can be solved for λ in terms of [latex]\kappa[/latex] as

$\begin{array}{l} λ = {(\frac{r}{\sum_{i = 1}^{n} x_{i}^{κ}})}^{1 / κ} . \end{array}$

Notice that λ reduces to the maximum likelihood estimator for the exponential distribution when [latex]\kappa = 1[/latex]. Using this expression for λ in terms of [latex]\kappa[/latex] in the second element of the score vector yields a single, albeit more complicated, expression with [latex]\kappa[/latex] as the only unknown. After applying some algebra, this equation reduces to

$\begin{array}{l} g (κ) = \frac{r}{κ} + \sum_{i \in U} \ln x_{i} - \frac{r \sum_{i = 1}^{n} x_{i}^{κ} \ln x_{i}}{\sum_{i = 1}^{n} x_{i}^{κ}} = 0, \end{array}$

which must be solved iteratively. One technique that can be used to solve this equation is the Newton–Raphson procedure, which uses

$\begin{array}{l} κ_{j + 1} = κ_{j} - \frac{g (κ_{j})}{g^{'} (κ_{j})}, \end{array}$

where [latex]\kappa_0[/latex] is an initial estimator. The iterative procedure can be repeated until the desired accuracy for [latex]\kappa[/latex] is achieved; that is, [latex]| \kappa_{j+1} - \kappa_j | < \epsilon[/latex], for some small positive real number ϵ. When the accuracy is achieved, the maximum likelihood estimator [latex]\hat{ \kappa}[/latex] is used to calculate [latex]\hat{\lambda} = \big(r / \sum_{\,i\,=\,1}^{n} x_i^{\hat{\kappa}}\big)^{1 / {\hat \kappa}}[/latex]. The derivative of [latex]g(\kappa)[/latex] reduces to

$\begin{array}{l} g^{'} (κ) = - \frac{r}{κ^{2}} - \frac{r}{{(\sum_{i = 1}^{n} x_{i}^{κ})}^{2}} [(\sum_{i = 1}^{n} x_{i}^{κ}) (\sum_{i = 1}^{n} (\ln x_{i})^{2} x_{i}^{κ}) - {(\sum_{i = 1}^{n} x_{i}^{κ} \ln x_{i})}^{2}] . \end{array}$

Determining an initial estimator [latex]\kappa_0[/latex] is not trivial. When there are no censored observations, Menon’s initial estimator for [latex]\kappa_0[/latex] is

$\begin{array}{l} κ_{0} = {\frac{6}{(n - 1) π^{2}} [\sum_{i = 1}^{n} (\ln t_{i})^{2} - \frac{{(\sum_{i = 1}^{n} \ln t_{i})}^{2}}{n}]}^{- 1 / 2} . \end{array}$

Least squares estimation can be used in the case of a right-censored data set. The Newton–Raphson procedure can fail to converge to the maximum likelihood estimators. A bisection algorithm or fixed point algorithm often provides more reliable convergence.

Fisher and observed information matrices. The [latex]2 \times 2[/latex] Fisher and observed information matrices are based on the following partial derivatives:

$\begin{array}{l} \frac{- \partial^{2} \ln L (λ, κ)}{\partial λ^{2}} & = \frac{κ r}{λ^{2}} + κ (κ - 1) λ^{κ - 2} \sum_{i = 1}^{n} x_{i}^{κ}, \\ \frac{- \partial^{2} \ln L (λ, κ)}{\partial λ \partial κ} & = - \frac{r}{λ} + [(κ λ^{κ - 1}) (\sum_{i = 1}^{n} x_{i}^{κ} \ln x_{i}) + (\sum_{i = 1}^{n} x_{i}^{κ}) (κ λ^{κ - 1} \ln λ + λ^{κ - 1})] \\ = - \frac{r}{λ} + λ^{κ - 1} [κ \sum_{i = 1}^{n} x_{i}^{κ} \ln x_{i} + (1 + κ \ln λ) \sum_{i = 1}^{n} x_{i}^{κ}], \\ \frac{- \partial^{2} \ln L (λ, κ)}{\partial κ^{2}} & = \frac{r}{κ^{2}} + \sum_{i = 1}^{n} (λ x_{i})^{κ} (\ln λ x_{i})^{2} . \end{array}$

The expected values of these quantities are not tractable, so the Fisher information matrix does not have closed-form elements. The observed information matrix, however, can be determined by using [latex]\hat{\lambda}[/latex] and [latex]\hat{\kappa}[/latex] as arguments in these expressions.

Example 5.12 Example 5.5 showed that the exponential distribution was a poor approximation to the ball bearing data lifetimes. The histogram in Figure 5.8 indicated that a probability distribution with a nonzero mode and an increasing hazard function might provide a better fit. Fit the Weibull distribution to the ball bearing lifetimes and assess the fit.

The maximum likelihood estimates, using the Newton–Raphson technique described previously, are [latex]\hat{\lambda} = 0.0122[/latex] and [latex]\hat \kappa = 2.10[/latex]. Figure 5.16 shows the empirical survivor function, along with the fitted exponential and Weibull survivor functions. It is clear that the Weibull distribution is superior to the exponential distribution in fitting the ball bearing failure times because it is capable of modeling wear out. The log likelihood function evaluated at the maximum likelihood estimators is [latex]\ln L(\hat{\lambda}, \, \hat \kappa) = -113.691[/latex]. The log likelihood function is shown in Figure 5.17.

Figure 5.16: Exponential and Weibull fits to the ball bearing data.

Long Description for Figure 5.16

The horizontal axis t ranges from 0 to 200 in increments of 50 units. The vertical axis S of t ranges from 0.0 to 1.0 in increments of 0.2 units. The downward step function with 19 steps decreases rapidly from 1.0 to 0.5, whilst the time increases from 0 to 50 units. It then decreases to 0.1 as time increases to 150 units. The exponential function is superimposed over this step function and decreases from 1.0 to 0.1 as time increases from 0 to 150 units. The exponential function lies completely below the step function for the first 50 t values and lies above it for the remaining t values. The Weibull function initially increases in a convex curve, then inflecting in a convex curve at (75, 0.5). Both curves intersect at (90, 0.3). All data are estimated.

Figure 5.17: Log likelihood function for the ball bearing data.

The observed information matrix is

$\begin{array}{l} O (\hat{λ}, \hat{κ}) = [\begin{array}{c} 681, 000 & 875 \\ 875 & 10.4 \end{array}], \end{array}$

revealing a positive correlation between the elements of the score vector. Using the fact that the likelihood ratio statistic, [latex]-2 \big[ \ln L( \lambda , \, \kappa ) - \ln L \big( \hat \lambda, \, \hat \kappa \big) \big][/latex], is asymptotically [latex]\chi^2 (2)[/latex], an approximate 95% confidence region for the parameters is all λ and [latex]\kappa[/latex] satisfying

$\begin{array}{l} - 2 [\ln L (λ, κ) + 113.691] < 5.99, \end{array}$

since [latex]\chi_{2, \, 0.05}^2 = 5.99[/latex]. The 95% confidence region is shown in Figure 5.18, and, not surprisingly, the line [latex]\kappa = 1[/latex] is not interior to the region. This indicates that the exponential distribution is not an appropriate model for this particular data set. This is yet more statistical evidence that the ball bearings are wearing out. Note that the boundary of this region is a level surface of the log likelihood function shown in Figure 5.17 that is cut [latex]{5.99} / 2[/latex] units below the maximum of the log likelihood function.

Figure 5.18: Confidence region ([latex]\alpha = 0.05[/latex]) for λ and [latex]\kappa[/latex] for the ball bearing data.

The R code to generate the confidence region is given below. The crplot function contained in the conf package calculates the maximum likelihood estimates [latex]\hat \lambda[/latex] and [latex]\hat \kappa[/latex] and plots the 95% confidence region. The first argument to crplot contains the data values, the second argument contains α, and the third argument contains the name of the population distribution. Setting the pts argument to FALSE means the points along the boundary are connected by lines; setting the origin argument to TRUE means the origin is included in the plot; setting the info argument to TRUE means the maximum likelihood estimates and boundary points in the confidence region can easily be retrieved.

As further evidence that the Weibull distribution is a significantly better model than the exponential, the likelihood ratio statistic can be used to determine whether [latex]\kappa[/latex] is significant. Evaluating the log likelihood values at the maximum likelihood estimators in the Weibull and exponential fits, the likelihood ratio statistic is

$\begin{array}{l} - 2 [\ln L (\hat{λ}) - \ln L (\hat{λ}, \hat{κ})] = - 2 [- 121.435 + 113.691] = 15.488 . \end{array}$

This value shows that there is a statistically significant difference between [latex]\kappa[/latex] and 1 when it is compared with the critical value [latex]\chi_{1, \, 0.05}^2 = 3.84[/latex].

If we are still uncertain as to whether [latex]\kappa[/latex] is significantly different from 1, the standard errors of the distribution of the parameter estimators can be computed by determining the inverse of the observed information matrix

$\begin{array}{l} O^{- 1} (\hat{λ}, \hat{κ}) = [\begin{array}{c} 0.00000165 & - 0.000139 \\ - 0.000139 & 0.108 \end{array}] . \end{array}$

This matrix is an estimate of the variance–covariance matrix for the parameter estimates [latex]\hat{\lambda}[/latex] and [latex]\hat{\kappa}[/latex]. The standard errors of the parameter estimates are the square roots of the diagonal elements

$\begin{array}{l} {\hat{σ}}_{\hat{λ}} = 0.00128 {\hat{σ}}_{\hat{κ}} = 0. 329. \end{array}$

Thus, an asymptotic 95% confidence interval for [latex]\kappa[/latex] is

$\begin{array}{l} 2.10 - (1.96) (0.329) < κ < 2.10 + (1.96) (0.329) \end{array}$

or

$\begin{array}{l} 1.46 < κ < 2.74, \end{array}$

since [latex]z_{0.025} = 1.96[/latex]. Since this confidence interval does not contain 1, the parameter [latex]\kappa[/latex] is statistically significant. Three different techniques have all drawn the same conclusion: the ball bearings are wearing out because there is a statistically significant difference between [latex]\hat \kappa = 2.10[/latex] and [latex]\kappa = 1[/latex].

5.6 Proportional Hazards Model

Parameter estimation for the proportional hazards model, which was introduced in Section 4.6, is considered in this section. Since there is now a vector of covariates in addition to a failure or censoring time for each item on test, special notation must be established to accommodate the covariates. The proportional hazards model has the unique feature that the baseline distribution need not be defined in order to estimate the regression coefficients associated with the covariates.

A lifetime model that incorporates a vector of covariates [latex]{\boldsymbol z} = ( z_1, \, z_2, \, \ldots, \, z_q) ^ \prime[/latex] models the impact of the covariates on survival. The reason for including this vector may be to determine which covariates significantly affect survival, to determine the distribution of the lifetime for a particular setting of the covariates, or to fit a more complicated distribution from a small data set, as opposed to fitting separate distributions for each level of the covariates.

The proportional hazards model was defined in Section 4.6 by

$\begin{array}{l} h (t, z) = ψ (z) h_{0} (t), \end{array}$

for [latex]t \ge 0[/latex], where [latex]h_0 (t)[/latex] is a baseline hazard function. The covariates increase the hazard function when [latex]\psi ({\boldsymbol z}) > 1[/latex] or decrease the hazard function when [latex]\psi ({\boldsymbol z}) < 1[/latex]. The goal of this section is to develop techniques for estimating the [latex]q \times 1[/latex] vector of regression coefficients [latex]\boldsymbol \beta[/latex] from a data set consisting of n items on test and r observed failure times.

The notation used to describe a data set in a lifetime model involving covariates will borrow some notation established earlier in this chapter, but also establish some new notation. As before, n is the number of items on test and r is the number of observed failures. The failure time of the ith item on test, t_i, is either observed or right censored at time c_i, for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. As before, let [latex]x_i = \min \{t_i, \, c_i\}[/latex] and δ_i be a censoring indicator variable (1 for an observed failure and 0 for a right-censored value), for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. In addition, a [latex]q \times 1[/latex] vector of covariates [latex]{\boldsymbol z}_i = (z_{i1}, \, z_{i2}, \, \ldots, \, z_{iq}) ^ \prime[/latex] is collected for each item on test, for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. Thus, [latex]z_{ij}[/latex] is the value of covariate j for item i, for [latex]i = 1, \, 2, \, \ldots, \, n[/latex] and [latex]j = 1, \, 2, \, \ldots, \, q[/latex]. This formulation of the problem can be stated in matrix form as

$\begin{array}{l} x = [\begin{array}{c} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{array}] δ = [\begin{array}{c} δ_{1} \\ δ_{2} \\ ⋮ \\ δ_{n} \end{array}] a n d Z = [\begin{array}{c} z_{11} & z_{12} & \dots & z_{1 q} \\ z_{21} & z_{22} & \dots & z_{2 q} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ z_{n 1} & z_{n 2} & \dots & z_{n q} \end{array}] . \end{array}$

Each row in the [latex]{\boldsymbol Z}[/latex] matrix consists of the values of the q covariates collected on a particular item. The matrix approach is useful because complicated systems of equations can be expressed compactly and operations on data sets can be performed efficiently by a computer. For parameter estimation, the survivor, density, hazard, and cumulative hazard functions now have the extra arguments [latex]{\boldsymbol z}[/latex] and [latex]\boldsymbol \beta[/latex] associated with them:

$$
S(t, \, {\boldsymbol z} , \, \boldsymbol \theta , \, \boldsymbol \beta) \qquad \qquad
f(t, \, {\boldsymbol z} , \, \boldsymbol \theta , \, \boldsymbol \beta) \qquad \qquad
h(t, \, {\boldsymbol z} , \, \boldsymbol \theta , \, \boldsymbol \beta) \qquad \qquad
H(t, \, {\boldsymbol z} , \, \boldsymbol \theta , \, \boldsymbol \beta) ,
$$

for [latex]t \ge 0[/latex], where the vector [latex]\boldsymbol \theta = (\theta_1, \, \theta_2, \, \ldots, \, \theta_p) ^ \prime[/latex] consists of the p unknown parameters associated with the baseline distribution, which must be estimated along with the regression coefficients [latex]\boldsymbol \beta[/latex].

Parameter estimation for the proportional hazards model can be divided into two cases. The first case is when the baseline distribution is known. This case applies when previous test results have indicated that a particular functional form of the baseline distribution is appropriate. The second case is when the baseline distribution is unknown. This is almost certainly the case when looking at a data set of lifetimes and covariates for the first time without any guidance with respect to an appropriate baseline distribution.

5.6.1 Known Baseline Distribution

When the baseline distribution is known, the parameter estimation procedure follows along the same lines as in the previous sections. The hazard function and cumulative hazard function in the proportional hazards model are

$\begin{array}{l} h (t, z, θ, β) = ψ (z) h_{0} (t) \end{array}$

and

$\begin{array}{l} H (t, z, θ, β) = ψ (z) H_{0} (t) \end{array}$

for [latex]t \ge 0[/latex], where [latex]\boldsymbol \theta[/latex] is a [latex]p \times 1[/latex] vector of unknown parameters associated with the baseline distribution. For simplicity and mathematical tractability, only the log linear form of the link function, which is [latex]\psi ( {\boldsymbol z} ) = e^{\kern 0.07em \boldsymbol \beta ^ \prime {\boldsymbol z}}[/latex], is considered here. This assumption is not necessary for some of the derivations, so many of the results apply to a wider range of link functions. When the log linear form of the link function is assumed, the hazard function and cumulative hazard function become

$\begin{array}{l} h (t, z, θ, β) = e^{β^{'} z} h_{0} (t) \end{array}$

and

$\begin{array}{l} H (t, z, θ, β) = e^{β^{'} z} H_{0} (t) \end{array}$

for [latex]t \ge 0[/latex], where [latex]{\boldsymbol \theta}[/latex] is a [latex]p \times 1[/latex] vector of unknown parameters associated with the baseline distribution. The log likelihood function is

$\begin{array}{l} \ln L (θ, β) & = \sum_{i \in U} \ln h (x_{i}, z_{i}, θ, β) - \sum_{i = 1}^{n} H (x_{i}, z_{i}, θ, β) \\ = \sum_{i \in U} [β^{'} z_{i} + \ln h_{0} (x_{i})] - \sum_{i = 1}^{n} e^{β^{'} z_{i}} H_{0} (x_{i}) . \end{array}$

This expression can be differentiated with respect to all the unknown parameters to arrive at the score vector, which is then equated to zero and solved numerically to arrive at the maximum likelihood estimates.

Two observations with respect to this model formulation are important. First, the maximum likelihood estimates for [latex]\boldsymbol \theta[/latex] and [latex]\boldsymbol \beta[/latex] for most of the models in this section cannot be expressed in closed form (as was the case for the exponential distribution in Section 5.4), so numerical methods typically need to be used to find the values of the estimates. Second, the choice of whether to use a model of dependence or to examine each population separately is dependent on the number of unique covariate vectors [latex]\boldsymbol z[/latex] and the number of items on test, n. If, for example, n is large and there is only a single binary covariate (that is, only two unique covariate vectors, [latex]z_1 = 0[/latex] and [latex]z_1 = 1[/latex]), it is probably wiser to analyze each of the two populations separately by the techniques described earlier.

Although numerical methods are required to find [latex]\hat {\boldsymbol \theta}[/latex] and [latex]\hat {\boldsymbol \beta}[/latex] in general, there are closed-form expressions in a very narrow case that satisfies the following conditions.

The log linear link function [latex]\psi ( {\boldsymbol z} ) = e^{\kern 0.07em {\boldsymbol \beta} ^ \prime {\boldsymbol z}}[/latex] is used to incorporate the vector of covariates [latex]{\boldsymbol z}[/latex] into the lifetime model.
The baseline distribution is exponential(λ), which means that the baseline hazard function is [latex]h_0(t) = \lambda[/latex] and the baseline cumulative hazard function is [latex]H_0(t) = \lambda t[/latex] for [latex]t \ge 0[/latex].

Under these assumptions, the general form for the hazard function in the proportional hazards model

$\begin{array}{l} h (t, z, θ, β) = ψ (z) h_{0} (t) \end{array}$

reduces to the special case

$\begin{array}{l} h (t, z, λ, β) = λ e^{β^{'} z} . \end{array}$

for [latex]t \ge 0[/latex]. It is often more convenient notationally to define an additional covariate, [latex]z_0 = 1[/latex], for all n items on test. This allows the baseline parameter [latex]\lambda = e^{{\beta}_0 z_0}[/latex] to be included in the vector of regression coefficients, rather than being considered separately. The baseline hazard function is effectively absorbed into the link function. In this case, the hazard function can be expressed as

$\begin{array}{l} h (t, z, β) = e^{β^{'} z} \end{array}$

for [latex]t \ge 0[/latex], where [latex]{\boldsymbol \beta} = (\beta_0, \, \beta_1, \, \ldots , \, \beta_q) ^ \prime[/latex] and [latex]{\boldsymbol z} = (z_0, \, z_1, \, \ldots , \, z_q) ^ \prime[/latex]. The corresponding cumulative hazard function is

$\begin{array}{l} H (t, z, β) = t e^{β^{'} z} \end{array}$

for [latex]t \ge 0[/latex]. Using this parameterization, the log likelihood function is

$\begin{array}{l} \ln L (β) & = \sum_{i \in U} \ln h (x_{i}, z_{i}, β) - \sum_{i = 1}^{n} H (x_{i}, z_{i}, β) \\ = \sum_{i \in U} β^{'} z_{i} - \sum_{i = 1}^{n} x_{i} e^{β^{'} z_{i}} . \end{array}$

Differentiating this expression with respect to β_j yields the elements of the score vector

$\begin{array}{l} \frac{\partial \ln L (β)}{\partial β_{j}} = \sum_{i \in U} z_{i j} - \sum_{i = 1}^{n} x_{i} z_{i j} e^{β^{'} z_{i}} \end{array}$

for [latex]j = 0, \, 1, \, \ldots, \, q[/latex]. When the elements of the score vector are equated to zero, the resulting set of [latex]q + 1[/latex] nonlinear equations in [latex]\boldsymbol \beta[/latex] must be solved numerically in the general case. There is a closed-form solution for this set of simultaneous equations when there is a single binary covariate, often referred to as the two-sample case.

To find the observed information matrix and the Fisher information matrix, a second partial derivative of the log likelihood function is required:

$\begin{array}{l} \frac{\partial^{2} \ln L (β)}{\partial β_{j} \partial β_{k}} = - \sum_{i = 1}^{n} x_{i} z_{i j} z_{i k} e^{β^{'} z_{i}} \end{array}$

for [latex]j = 0, \, 1, \, \ldots, \, q[/latex] and [latex]k = 0, \, 1, \, \ldots, \, q[/latex]. The observed information matrix can be determined by using the maximum likelihood estimate [latex]\hat {\boldsymbol \beta}[/latex] as an argument in this second partial derivative. Thus, the [latex](j, \, k)[/latex] element of the observed information matrix is

$\begin{array}{l} {[- \frac{\partial^{2} \ln L (β)}{\partial β_{j} \partial β_{k}}]}_{β = \hat{β}} = \sum_{i = 1}^{n} x_{i} z_{i j} z_{i k} e^{{\hat{β}}^{'} z_{i}} \end{array}$

for [latex]j = 0, \, 1, \, \ldots, \, q[/latex] and [latex]k = 0, \, 1, \, \ldots, \, q[/latex]. For computational purposes, this can be expressed in matrix form as

$\begin{array}{l} O (\hat{β}) = Z^{'} \hat{B} Z, \end{array}$

where [latex]\hat{{\boldsymbol B}}[/latex] is an [latex]n \times n[/latex] diagonal matrix whose elements are [latex]x_1 e^{{\kern 0.07em \hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_1}, \, x_2 e^{{\kern 0.07em \hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_2}, \, \ldots, \, x_n e^{{\kern 0.07em \hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_n}[/latex]. The Fisher information matrix is more difficult to calculate because it involves the expected value of the second partial derivative:

$$
E \left[
-{\partial^2 \ln \, L({\boldsymbol \beta}) \over
\partial \beta_j \partial \beta_k} \right] =
\sum_{i\,=\,1}^n
z_{ij} z_{ik} e^{\kern 0.07em {\boldsymbol \beta} ^ \prime {\boldsymbol z}_i} E[x_i]
$$

for [latex]j = 0, \, 1, \, \ldots, \, q[/latex] and [latex]k = 0, \, 1, \, \ldots, \, q[/latex]. Determining the value of [latex]E[x_i][/latex] will be considered separately in the paragraphs that follow for uncensored ([latex]r = n[/latex]) and censored ([latex]r < n[/latex]) data sets.

For a complete data set, [latex]E[x_i] = E[t_i][/latex], for [latex]i = 1, \, 2, \, \ldots, \, n[/latex], because there is no censoring. Since the population mean of the exponential distribution is the reciprocal of the failure rate and the ith item on test has failure rate [latex]e^{\kern 0.07em {\boldsymbol \beta} ^ \prime {\boldsymbol z}_i}[/latex], [latex]E[x_i] = e^{-\kern 0.07em {\boldsymbol \beta} ^ \prime {\boldsymbol z}_i}[/latex]. Returning to the Fisher information matrix, the [latex](j, \, k)[/latex] element is

$\begin{array}{l} E [- \frac{\partial^{2} \ln L (β)}{\partial β_{j} \partial β_{k}}] = \sum_{i = 1}^{n} z_{i j} z_{i k} e^{β^{'} z_{i}} e^{- β^{'} z_{i}} = \sum_{i = 1}^{n} z_{i j} z_{i k} \end{array}$

for [latex]j = 0, \, 1, \, \ldots, \, q[/latex] and [latex]k = 0, \, 1, \, \ldots, \, q[/latex]. This result for the Fisher information matrix has a particularly tractable matrix representation

$\begin{array}{l} I (β) = Z^{'} Z, \end{array}$

which is a function of the matrix of covariates only.

For a censored data set, the expression for [latex]E[x_i][/latex] is a bit more complicated. Since the failure rate for the ith item on test is [latex]e^{\kern 0.07em {\boldsymbol \beta} ^ \prime {\boldsymbol z}_i}[/latex],

$\begin{array}{l} E [x_{i}] & = E [min {t_{i}, c_{i}}] \\ = \int_{0}^{c_{i}} t_{i} f_{T_{i}} (t_{i}) d t_{i} + c_{i} P [t_{i} \geq c_{i}] \\ = \int_{0}^{c_{i}} t_{i} e^{β^{'} z_{i}} e^{- e^{β^{'} z_{i}} t_{i}} d t_{i} + c_{i} e^{- e^{β^{'} z_{i}} c_{i}} \\ = e^{- β^{'} z_{i}} (1 - e^{- e^{β^{'} z_{i}} c_{i}}) \end{array}$

for [latex]i = 1, \, 2, \, \ldots, \, n[/latex], by using integration by parts. This means that the [latex](j, \, k)[/latex] element of the Fisher information matrix is

$\begin{array}{l} E [- \frac{\partial^{2} \ln L (β)}{\partial β_{j} \partial β_{k}}] = \sum_{i = 1}^{n} z_{i j} z_{i k} e^{β^{'} z_{i}} e^{- β^{'} z_{i}} [1 - e^{- e^{β^{'} z_{i}} c_{i}}] = \sum_{i = 1}^{n} z_{i j} z_{i k} (1 - γ_{i}), \end{array}$

where [latex]\gamma_i = e^{-e^{\kern 0.07em {\boldsymbol \beta} ^ {\kern 0.07em \prime} {\boldsymbol z}_i} c_i}[/latex] is the probability that the ith item on test is censored, for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. The potential censoring time for the ith item on test, c_i, must be known for each item in order to compute the Fisher information matrix, which is not always the case in practice. Letting [latex]{\boldsymbol \Gamma}[/latex] be a diagonal matrix with elements [latex]\gamma_1, \, \gamma_2, \, \ldots, \, \gamma_n[/latex], the Fisher information matrix can be written in matrix form as

$\begin{array}{l} I (β) = Z^{'} (I - Γ Γ) Z, \end{array}$

which is independent of the failure times.

Before ending the discussion on the exponential baseline distribution, the two-sample case, where a binary covariate z₁ is used to differentiate between the control ([latex]z_1 = 0[/latex]) and treatment ([latex]z_1 = 1[/latex]) cases, is considered. This case is of interest because the maximum likelihood estimates can be expressed in closed form. The notation for the two-sample case is summarized in Table 5.2. As before, [latex]z_0 = 1[/latex] is included in the vector of covariates to account for the baseline distribution. The set of two nonlinear equations for finding the estimates of [latex]{\boldsymbol \beta} = (\beta_0, \beta_1) ^ \prime[/latex] obtained by setting the score vector equal to 0 is

$\begin{array}{l} \sum_{i \in U} z_{i 0} - \sum_{i = 1}^{n} x_{i} z_{i 0} e^{β_{0} z_{i 0} + β_{1} z_{i 1}} & = 0, \\ \sum_{i \in U} z_{i 1} - \sum_{i = 1}^{n} x_{i} z_{i 1} e^{β_{0} z_{i 0} + β_{1} z_{i 1}} & = 0. \end{array}$

Let [latex]r_0 > 0[/latex] be the number of observed failures in the control group ([latex]z_1 = 0[/latex]), and let [latex]r_1 > 0[/latex] be the number of observed failures in the treatment group ([latex]z_1 = 1[/latex]). Since [latex]z_0 = 1[/latex] for all items on test, the equations reduce to

$\begin{array}{l} r_{0} + r_{1} - \sum_{i = 1}^{n} x_{i} e^{β_{0} + β_{1} z_{i 1}} & = 0, \\ r_{1} - \sum_{i = 1}^{n} x_{i} z_{i 1} e^{β_{0} + β_{1} z_{i 1}} & = 0. \end{array}$

These equations can be further simplified by partitioning the summations based on the value of z₁:

$$\begin{align*}
r_0 + r_1 –
\sum_{{i} \, | \, z_{i1} = \kern 0.07em 0} x_i e^{{\beta}_0} –
\sum_{{i} \, | \, z_{i1} = 1} x_i e^{{\beta}_0 + \beta_1 } & = 0, \\
r_1 – \sum_{{i} \, | \, z_{i1} = 1} x_i e^{{\beta}_0 + \beta_1} & = 0.
\end{align*}$$

Letting [latex]\lambda_0 = e^{{\beta}_0}[/latex] be the failure rate in the control group ([latex]z_1 = 0[/latex]) and letting [latex]\lambda_1 = e^{{\beta}_0 + \beta_1}[/latex] be the failure rate in the treatment group ([latex]z_1 = 1[/latex]), the equations become

$$\begin{align*}
r_0 + r_1 – \lambda_0
\sum_{{i} \, | \, z_{i1} =
\kern 0.07em 0} x_i – \lambda_1 \sum_{i \, | \, z_{i1} = 1}
x_i & = 0, \\
r_1 – \lambda_1 \sum_{i \, | \, z_{i1} = 1} x_i & = 0.
\end{align*}$$

When these equations are solved simultaneously, the maximum likelihood estimates for λ₀ and λ₁ are the same as those for the exponential distribution with two separate populations:

$\begin{array}{l} {\hat{λ}}_{0} = \frac{r_{0}}{\sum_{i | z_{i 1} = 0} x_{i}} a n d {\hat{λ}}_{1} = \frac{r_{1}}{\sum_{i | z_{i 1} = 1} x_{i}} . \end{array}$

These estimators are the ratio of the number of observed failures to the total time on test within the two groups.

Example 5.13 The patients in the 6–MP drug experiment described in Example 5.6 are broken down into a control group that did not receive the drug ([latex]z_1 = 0[/latex]) and a treatment group that did receive the drug ([latex]z_1 = 1[/latex]). The remission times, in weeks, for the 21 patients in the control group are

$\begin{array}{l} 1 1 2 2 3 4 4 5 5 8 8 \\ 8 8 11 11 12 12 15 17 22 23. \end{array}$

Table 5.2: Single binary covariate proportional hazards model notation.
	Control Group	Treatment Group
Number of failures	r₀	r₁
Baseline covariate z₀	1	1
Binary covariate z₁	0	1

The remission times for the 21 patients that received the drug are

$\begin{array}{l} 6 6 6 6^{*} 7 9^{*} 10 10^{*} 11^{*} 13 16 \\ 17^{*} 19^{*} 20^{*} 22 23 25^{*} 32^{*} 32^{*} 34^{*} 35^{*} . \end{array}$

There are a total of [latex]n = 42[/latex] patients in the clinical trial, and there are a total of [latex]r = 30[/latex] observed cancer recurrences, [latex]r_0 = 21[/latex] of which are in the control group and [latex]r_1 = 9[/latex] of which are in the treatment group. The values of [latex]{\boldsymbol x}[/latex], [latex]{\boldsymbol \delta}[/latex], and [latex]{\boldsymbol Z}[/latex] are given in Figure 5.19; the control group values have been arbitrarily placed first in the [latex]{\boldsymbol x}[/latex] vector. Note that for this analysis the order of the observations in the [latex]{\boldsymbol x}[/latex] vector is irrelevant. For tied values, the censored values have been placed last.

The maximum likelihood estimates for the failure rates for the two populations are

$\begin{array}{l} {\hat{λ}}_{0} = \frac{r_{0}}{\sum_{i | z_{i 1} = 0} x_{i}} = \frac{21}{182} = 0.1 15 a n d {\hat{λ}}_{1} = \frac{r_{1}}{\sum_{i | z_{i 1} = 1} x_{i}} = \frac{9}{359} = 0.0251 \end{array}$

or, equivalently, in terms of the estimated mean remission times, the expected remission times of the control and treatment groups are estimated to be [latex]{182 \over 21 } = 8.67[/latex] weeks and [latex]{359 \over 9} = 39.9[/latex] weeks, respectively. These estimates can be easily converted to the coefficients in the proportional hazards model:

$\begin{array}{l} {\hat{β}}_{0} = \ln [\frac{21}{182}] = - 2.16 a n d {\hat{β}}_{1} = \ln [\frac{(9) (182)}{(359) (21)}] = - 1.53 . \end{array}$

Confidence intervals can be determined separately for the two populations because the remission times in each are assumed to be exponentially distributed. Using the techniques from Section 5.4, an exact two-sided 95% confidence interval for λ₀ is

$\begin{array}{l} \frac{(0.115) (26.00)}{42} < λ_{0} < \frac{(0.115) (61.78)}{42} \end{array}$

$\begin{array}{l} 0.0714 < λ_{0} < 0.170 \end{array}$

based on the chi-square distribution with 42 degrees of freedom. An approximate two- sided 95% confidence interval for λ₁ is

$\begin{array}{l} \frac{(0.0251) (8.23)}{18} < λ_{1} < \frac{(0.0251) (31.53)}{18} \end{array}$

$\begin{array}{l} 0.0115 < λ_{1} < 0.0439 \end{array}$

based on the chi-square distribution with 18 degrees of freedom. The first confidence interval is exact because the control group contains no censored observations, and the second confidence interval is approximate because the treatment group has randomly censored observations. Since these confidence intervals do not overlap, it can be concluded that 6–MP is effective in increasing remission times. If the side effects from 6–MP are minor, it should be prescribed to all leukemia patients.

Figure 5.19: Data values for the 6–MP experiment with a single binary covariate.

Long Description for Figure 5.19

The x value is a 42 by 1 matrix, with row entries from top to bottom in the same order as 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11, 11, 12, 12, 15, 17, 22, 23, 6, 6, 6, 6, 7, 9, 10, 10, 11, 13, 16, 17, 19, 20, 22, 23, 25, 32, 32, 34, and 35. The delta value is a 42 by 1 matric with the first 24 rows having entries 1, and the remaining rows having entries as follows. 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, and 0. The value of Z is a 42 by 2 matrix, with column 1 values having entry 1, column 2 has first 21 entries 0, and the remaining entries 1.

Since exact confidence intervals apply only to the two-sample case with an exponential baseline distribution and Type II censoring, asymptotic intervals will also be calculated here to illustrate how they are developed in the general case. The Fisher information matrix cannot be calculated for this data set because the observed remission times do not have corresponding known censoring times. The observed information matrix, on the other hand, is easily calculated using the matrix formulation

$\begin{array}{l} O (\hat{β}) = Z^{'} \hat{B} Z = [\begin{array}{c} 30 & 9 \\ 9 & 9 \end{array}], \end{array}$

where [latex]\hat{\boldsymbol B}[/latex] is a [latex]42 \times 42[/latex] diagonal matrix with elements [latex]x_1 e^{{\hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_1}[/latex], [latex]x_2 e^{{\hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_2}[/latex], [latex]\ldots[/latex], [latex]x_{42} e^{{\hat {\boldsymbol \beta}} ^ \prime {\boldsymbol z}_{42}}[/latex]. Since the determinant of this matrix is [latex](30)(9) - 9^2 = 189[/latex], it has inverse

$\begin{array}{l} O^{- 1} (\hat{β}) = [\begin{array}{cc} 9 / 189 & - 9 / 189 \\ - 9 / 189 & 30 / 189 \end{array}], \end{array}$

which estimates the variance–covariance matrix of the maximum likelihood estimates. The off-diagonal elements of [latex]O^{-1} \big(\hat {\boldsymbol \beta} \big)[/latex] indicate a negative correlation between [latex]\hat \beta_0[/latex] and [latex]\hat \beta_1[/latex]. The square roots of the diagonal elements yield asymptotic estimates for the standard deviation of the regression parameter estimates. Thus, the asymptotic estimated standard deviation of the estimate for β₀ is

$\begin{array}{l} \sqrt{\hat{V} [{\hat{β}}_{0}]} = \sqrt{\frac{9}{189}} = 0.218, \end{array}$

and the asymptotic estimated standard deviation of the estimate for β₁ is

$\begin{array}{l} \sqrt{\hat{V} [{\hat{β}}_{1}]} = \sqrt{\frac{30}{189}} = 0.398 . \end{array}$

These values can be used in the usual fashion to obtain asymptotically valid confidence intervals and perform hypothesis testing with respect to the regression parameter estimates. Note that [latex]\hat{\beta}_1 = -1.53[/latex] is more than three standard deviation units away from 0, supporting the conclusion that there is a statistically significant difference between the patients who take 6–MP versus those that do not with respect to their remission times. Since the sign of [latex]\hat{\beta}_1[/latex] is negative, the drug prolongs the remission times. More specifically, since the proportional hazards model is being used, a patient taking the 6–MP drug will have a hazard function that is estimated to be [latex]e^{{\hat{\beta}}_1} = e^{-1.53} = 0.217[/latex] times that of a patient who does not take the drug.

Parameter estimation for single binary covariate is ideal in the sense that the parameter estimates can be expressed in closed form. The next subsection considers the more common situation in which the baseline distribution is unknown.

5.6.2 Unknown Baseline Distribution

In many applications, the baseline distribution is not known. Furthermore, the modeler may not be interested in the baseline distribution, rather only in the influence of the covariates on survival. A technique has been developed for the proportional hazards model that allows the coefficient vector [latex]{\boldsymbol \beta}[/latex] to be estimated without knowledge of the parametric form of the baseline distribution. This type of analysis might be appropriate when the modeler wants to detect which covariates are significant, to determine which covariate is the most significant, or to analyze interactions among covariates. This technique is characteristic of nonparametric methods because it is impossible to misspecify the baseline distribution.

The focus of this estimation technique is on the indexes of the components on test, as will be seen in the derivation to follow. Since this procedure is very different from all previous point estimation derivations, an example will be carried through the derivation to illustrate the notation and the method. The purpose in this small example is to determine whether light bulb wattage influences light bulb survival. This introduction to parameter estimation will alternate between the small example and the general case. In this example and the derivation, it is initially assumed that there is no censoring and there are no tied observations.

Example 5.14 A set of [latex]n = 3[/latex] light bulbs are placed on test. The first and second bulbs are 100-watt bulbs and the third bulb is a 60-watt bulb. A single ([latex]q = 1[/latex]) covariate z₁ assumes the value 0 for a 60-watt bulb and 1 for a 100-watt bulb. The purpose of the test is to determine if the wattage has any influence on the survival distribution of the bulbs. The baseline distribution is unknown and unspecified, so there is only one parameter in the proportional hazards model, the regression coefficient β₁, that needs to be estimated. This small data set is used for illustrative purposes only, and we would obviously need to collect more than three data points to detect any statistically significant difference between the two wattages. Let [latex]t_1 = 80[/latex], [latex]t_2 = 20[/latex], and [latex]t_3 = 50[/latex] denote the lifetimes of the three bulbs. From the notation developed earlier in this chapter,

$\begin{array}{l} x = [\begin{array}{c} 80 \\ 20 \\ 50 \end{array}] δ = [\begin{array}{c} 1 \\ 1 \\ 1 \end{array}] Z = [\begin{array}{c} 1 \\ 1 \\ 0 \end{array}] . \end{array}$

The order statistics are [latex]t_{(1)} = 20[/latex], [latex]t_{(2)} = 50[/latex], and [latex]t_{(3)} = 80[/latex]. Figure 5.20 illustrates the definitions made thus far. Recall that the first subscript on [latex]z_{ij}[/latex] is the bulb number and the second subscript is the covariate number. The risk set [latex]R(t)[/latex], parameterized by the failure times, is defined as the set of indexes of bulbs at risk just prior to time t. In this case

$\begin{array}{l} R (t_{(1)}) = R (t_{2}) = R (20) = {1, 2, 3} \end{array}$

Figure 5.20: Proportional hazards parameter estimation notation.

Long Description for Figure 5.20

The horizontal axis t lists lifetimes 20, 50, and 80 as t of 1, t of 2, and t of 3, respectively. The vertical axis lists bulb 1 of 100 watts, bulb 2 of 100 watts, and bulb 3 of 60 watts from top to bottom. The lifetime of 100 watts bulb 1 is t 1 which is 80. The lifetime of 100 watts bulb 2 is t 2 which is 20. The lifetime of 60 watts bulb is t 3 which is 50. The t 1, t 2, and t 3 values are indicated with X marks. The covariate values are z subscript 1 1 equals 1 for bulb 1, z subscript 2 1 equals 1 for bulb 2, and z subscript 3 1 equals 0 for bulb 3.

since all bulbs are at risk just prior to [latex]t_{(1)}[/latex]. At time [latex]t_{(2)}[/latex], the risk set is

$\begin{array}{l} R (t_{(2)}) = R (t_{3}) = R (50) = {1, 3} \end{array}$

since bulbs 1 and 3 are at risk just prior to [latex]t_{(2)}[/latex]. Finally, at time [latex]t_{(3)}[/latex], the risk set is

$\begin{array}{l} R (t_{(3)}) = R (t_{1}) = R (80) = {1} \end{array}$

since only bulb 1 is still on test just prior to [latex]t_{(3)}[/latex]. Similar to the concept of a pointer array from computer science, a rank vector [latex]{\boldsymbol r}[/latex] is used here to simplify the notation. The ith element of the rank vector is the index of the item that fails at [latex]t_{(i)}[/latex], for [latex]i = 1, \, 2, \, 3[/latex]. For this particular data set,

$\begin{array}{l} r = [\begin{array}{c} 2 \\ 3 \\ 1 \end{array}] \end{array}$

because bulb 2 fails first, bulb 3 fails next, and bulb 1 fails last. The failure times for each bulb can therefore be determined from the order statistics and the rank vector.

The notation defined in the example is easily extended from three items on test with a single binary covariate to the general case. Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be n distinct lifetimes. Each lifetime t_i has an associated [latex]q \times 1[/latex] vector of covariates [latex]{\boldsymbol z}_i[/latex], for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. The ith order statistic is given by [latex]t_{(i)}[/latex], and the risk set [latex]R(t_{(i)})[/latex] is the set of indexes of all items that are at risk just prior to [latex]t_{(i)}[/latex], for [latex]i = 1, \, 2, \, \ldots, \, n[/latex]. The ith element of the rank vector [latex]{\boldsymbol r} = (r_1, \, r_2, \, \ldots , r_n ) ^ \prime[/latex] is the index of the item that fails at time [latex]t_{(i)}[/latex], for [latex]i = 1, \, 2, \, \ldots , \, n[/latex]. The observed failure times and their associated indexes are equivalent to the observed order statistics and the associated rank vector. Now that the new notation has been defined, the emphasis transitions to determining the probability that a particular permutation of the indexes appears in the rank vector.

Example 5.15 We now return to the light bulb life test described in Example 5.14. The joint probability distribution of the elements of the rank vector, denoted by [latex]f(r_1, \, r_2, \, r_3 )[/latex], is now considered for the data set containing [latex]n = 3[/latex] observations. In this case, there are [latex]3! = 6[/latex] possible permutations of the ranks of the observations:

$\begin{array}{l} [\begin{array}{c} 1 \\ 2 \\ 3 \end{array}] [\begin{array}{c} 1 \\ 3 \\ 2 \end{array}] [\begin{array}{c} 2 \\ 1 \\ 3 \end{array}] [\begin{array}{c} 2 \\ 3 \\ 1 \end{array}] [\begin{array}{c} 3 \\ 1 \\ 2 \end{array}] [\begin{array}{c} 3 \\ 2 \\ 1 \end{array}] . \end{array}$

If the wattage of the light bulb had no influence on the survival time, then clearly [latex]{f(r_1, \, r_2, \, r_3) = {{1 \over 6}}}[/latex] for all six permutations because all three items are drawn from a homogeneous population with respect to survival. Switching to the non-equally-likely case, the probability mass function for the rank vector will be determined by finding the conditional probabilities associated with the ranks. For example, assume that a failure has just occurred at time [latex]t_{(2)} = 50[/latex], and the history up to time 50, which is bulb 2 failed at time [latex]t_{(1)} = 20[/latex], is known. The bulb that fails at time 50 is either bulb 1 or bulb 3. For small [latex]\Delta \kern 0.02em t[/latex], the conditional probability that the bulb failing at time 50 is bulb 1 is

$\begin{array}{l} P (r_{2} = 1 | t_{(1)} = 20, t_{(2)} = 50, r_{1} = 2) & = \frac{P (bulb 1 fails at time 50)}{P (one item from R (t_{(2)}) fails at time 50)} \\ = \frac{h (50, z_{11}) Δ t}{h (50, z_{11}) Δ t + h (50, z_{31}) Δ t} \\ = \frac{h (50, z_{11})}{h (50, z_{11}) + h (50, z_{31})} \\ = \frac{ψ (z_{11}) h_{0} (50)}{ψ (z_{11}) h_{0} (50) + ψ (z_{31}) h_{0} (50)} \\ = \frac{ψ (z_{11})}{ψ (z_{11}) + ψ (z_{31})} \\ = \frac{e^{β_{1} z_{11}}}{e^{β_{1} z_{11}} + e^{β_{1} z_{31}}} \\ = \frac{e^{β_{1}}}{e^{β_{1}} + 1} \end{array}$

because the first bulb is 100 watts ([latex]z_{11} = 1[/latex]) and the third bulb is 60 watts ([latex]z_{31} = 0[/latex]). Note that the baseline hazard function has dropped out of this expression, so this probability will be the same regardless of the choice of [latex]h_0 (t)[/latex]. Also, the first two order statistics, [latex]t_{(1)}[/latex] and [latex]t_{(2)}[/latex], were not used in the calculation of this conditional probability. By similar reasoning, the conditional probability that the 60-watt bulb is the second to fail is

$\begin{array}{l} P (r_{2} = 3 | t_{(1)} = 20, t_{(2)} = 50, r_{1} = 2) = \frac{1}{e^{β_{1}} + 1} . \end{array}$

In the example, as well as in the general case, the conditional probability expression does not involve the failure times, making it possible to shorten [latex]P( r_j = i \,|\, t_{(1)}, \, t_{(2)}, \, \ldots, \, t_{(j)}, \, r_1, \, r_2, \, \ldots, \, r_{{j - 1}})[/latex] to just [latex]P( r_j = i \,|\, r_1, \, r_2, \, \ldots , \, r_{{j - 1}} )[/latex]. The probability that the jth element of the rank vector will be equal to i, given [latex]t_{(j)}[/latex] and the failure history up to [latex]t_{(j)}[/latex], is

$\begin{array}{l} P (r_{j} = i | r_{1}, r_{2}, \dots, r_{j - 1}) & = \frac{h (t_{(j)}, z_{i}) Δ t}{\sum_{k \in R (t_{(j)})} h (t_{(j)}, z_{k}) Δ t} \\ = \frac{h (t_{(j)}, z_{i})}{\sum_{k \in R (t_{(j)})} h (t_{(j)}, z_{k})} \\ = \frac{ψ (z_{i}) h_{0} (t_{(j)})}{\sum_{k \in R (t_{(j)})} ψ (z_{k}) h_{0} (t_{(j)})} \\ = \frac{ψ (z_{i})}{\sum_{k \in R (t_{(j)})} ψ (z_{k})} \\ = \frac{e^{β^{'} z_{i}}}{\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}} . \end{array}$

Example 5.16 We continue with the light bulb life test with [latex]n = 3[/latex] bulbs on test from Examples 5.14 and 5.15. It is now a simple task to use this conditional probability to determine the probability mass function for the indexes. For the three light bulbs, this probability mass function is

$\begin{array}{l} f (r_{1}, r_{2}, r_{3}) & = f (r_{3} | r_{1}, r_{2}) f (r_{1}, r_{2}) \\ = f (r_{3} | r_{1}, r_{2}) f (r_{2} | r_{1}) f (r_{1}) \\ = f (r_{1}) f (r_{2} | r_{1}) f (r_{3} | r_{1}, r_{2}) \end{array}$

over all six permutations of the rank vector. Since the sequence that was observed for the rank vector was [latex]{\boldsymbol r} = (2, \, 3, \, 1) ^ \prime[/latex], this becomes

$\begin{array}{l} f (2, 3, 1) & = f (2) f (3 | 2) f (1 | 2, 3) \\ = \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})} \cdot \frac{ψ (z_{31})}{ψ (z_{11}) + ψ (z_{31})} \cdot \frac{ψ (z_{11})}{ψ (z_{11})} \\ = \frac{e^{β_{1} z_{21}}}{e^{β_{1} z_{11}} + e^{β_{1} z_{21}} + e^{β_{1} z_{31}}} \cdot \frac{e^{β_{1} z_{31}}}{e^{β_{1} z_{11}} + e^{β_{1} z_{31}}} \\ = \frac{e^{β_{1}}}{e^{β_{1}} + e^{β_{1}} + 1} \cdot \frac{1}{e^{β_{1}} + 1} \\ = \frac{e^{β_{1}}}{(2 e^{β_{1}} + 1) (e^{β_{1}} + 1)} . \end{array}$

Treating this expression as a likelihood function [latex]L( \beta_1 )[/latex], the problem reduces to determining the β₁ value that maximizes the log likelihood function

$\begin{array}{l} \ln L (β_{1}) = β_{1} - \ln (2 e^{β_{1}} + 1) - \ln (e^{β_{1}} + 1) . \end{array}$

The score statistic is

$\begin{array}{l} \frac{\partial \ln L (β_{1})}{\partial β_{1}} = 1 - \frac{2 e^{β_{1}}}{2 e^{β_{1}} + 1} - \frac{e^{β_{1}}}{e^{β_{1}} + 1} . \end{array}$

Setting the score statistic to zero and solving for the maximum likelihood estimate, [latex]\hat{\beta}_1 = (- \ln \, 2) / 2 = -0.347[/latex]. Since [latex]\hat{\beta}_1 < 0[/latex], there is lower risk for the 100-watt bulbs than for 60-watt bulbs. More specifically, the hazard function for 100-watt light bulbs is [latex]e^{{\hat{\beta}}_1} = \sqrt{2}/2 = e^{-0.347} = 0.707[/latex] times that of the baseline hazard function for 60-watt bulbs, regardless of what baseline distribution is considered. To see if this regression coefficient is statistically significant involves calculating the negative of the derivative of the score:

$\begin{array}{l} - \frac{\partial^{2} \ln L (β_{1})}{\partial β_{1}^{2}} = \frac{2 e^{β_{1}}}{(2 e^{β_{1}} + 1)^{2}} + \frac{e^{β_{1}}}{(e^{β_{1}} + 1)^{2}} . \end{array}$

When this expression is evaluated at [latex]\beta_1 = \hat{\beta}_1 = -0.347[/latex], the [latex]1 \times 1[/latex] observed information matrix is 0.485, so the asymptotic estimate of the variance of [latex]\hat \beta_1[/latex] is [latex]1/0.485 = 2.06[/latex], and the asymptotic estimate of the standard deviation of [latex]\hat \beta_1[/latex] is [latex]\sqrt{2.06} = 1.44[/latex]. Since [latex]\hat{\beta}_1[/latex] is only a fraction of a standard deviation away from 0, z₁ is not statistically significant. This result is not surprising considering the small number of light bulbs placed on the life test. Note that these values are only asymptotically correct and are obviously poor approximations when [latex]n = 3[/latex]. The p-value for testing [latex]H_0: \beta_1 = 0[/latex] versus [latex]{H_1: \beta_1 \ne 0}[/latex] is 0.809, indicating that there is no statistical evidence that wattage influences the longevity of a light bulb for this tiny data set. In addition, only the order of the failure times and not their numerical values were used to find [latex]\hat{\beta}_1[/latex]. This means, for example, that the failure time of the third bulb, t₃, could have fallen anywhere on the interval [latex](20, 80)[/latex], and the estimate would have been the same because the order of the observed failure times was not changed.

The R code below confirms the calculations given above. The coxph function, which is part of the survival package, is used to calculate the estimated regression coefficient [latex]\hat{\beta}_1 = -0.347[/latex], which is stored in b, the [latex]1 \times 1[/latex] observed information matrix, which is stored in v, and the p-value for the hypothesis test, which is stored in p.

The procedure for estimating β₁ can be generalized from the example without any significant difficulties. The probability mass function for the indexes, or the likelihood function for [latex]\boldsymbol {\beta}[/latex], is now

$\begin{array}{l} L (β) & = f (r_{1}) f (r_{2} | r_{1}) \dots f (r_{n} | r_{1}, r_{2}, \dots, r_{n - 1}) \\ = \prod_{j = 1}^{n} \frac{ψ (z_{r_{j}})}{\sum_{k \in R (t_{(j)})} ψ (z_{k})} \\ = \prod_{j = 1}^{n} \frac{e^{β^{'} z_{r_{j}}}}{\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}} . \end{array}$

The log likelihood is

$\begin{array}{l} \ln L (β) = \sum_{j = 1}^{n} [β^{'} z_{r_{j}} - \ln \sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}] . \end{array}$

The score vector has sth component

$\begin{array}{l} \frac{\partial \ln L (β)}{\partial β_{s}} = \sum_{j = 1}^{n} [z_{s r_{j}} - \frac{\sum_{k \in R (t_{(j)})} z_{s k} e^{β^{'} z_{k}}}{\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}}] \end{array}$

for [latex]s = 1, \, 2, \, \ldots, \, q[/latex]. The vector of maximum likelihood estimators [latex]\hat {\boldsymbol \beta}[/latex] is obtained when the elements of the score vector are equated to zero and solved via numerical methods. To determine an estimate for the variance of [latex]\hat {\boldsymbol \beta}[/latex], the score vector must be differentiated to calculate the observed information matrix. The diagonal elements of the inverse of the observed information matrix are asymptotically valid estimates of the variance of [latex]\hat {\boldsymbol \beta}[/latex].

There are two approaches to handle right censoring that do not significantly complicate the derivation presented thus far. The first approach is to assume that right censoring occurs immediately after a failure occurs when a failure time and right-censoring time coincide. This assumption is valid for a Type II censored data set, but will involve an approximation for more general right-censoring schemes. In this case the rank vector is shortened to only r elements, corresponding to the indexes of the observed failure times [latex]t_{(1)}, \, t_{(2)}, \, \ldots, \, t_{(r)}[/latex]. The likelihood function is

$\begin{array}{l} L (β) = \prod_{j = 1}^{r} \frac{ψ (z_{r_{j}})}{\sum_{k \in R (t_{(j)})} ψ (z_{k})} = \prod_{j = 1}^{r} \frac{e^{β^{'} z_{r_{j}}}}{\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}} . \end{array}$

The log likelihood function is

$\begin{array}{l} \ln L (β) = \sum_{j = 1}^{r} [β^{'} z_{r_{j}} - \ln \sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}] . \end{array}$

The score vector has sth component

$\begin{array}{l} \frac{\partial \ln L (β)}{\partial β_{s}} = \sum_{j = 1}^{r} [z_{s r_{j}} - \frac{\sum_{k \in R (t_{(j)})} z_{s k} e^{β^{'} z_{k}}}{\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}}] \end{array}$

for [latex]s = 1, \, 2, \, \ldots, \, q[/latex]. Using the quotient rule, the derivative of the score vector is

$\begin{array}{l} \frac{\partial^{2} \ln L (β)}{\partial β_{s} \partial β_{t}} = - \sum_{j = 1}^{r} \frac{(\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}}) (\sum_{k \in R (t_{(j)})} z_{s k} z_{t k} e^{β^{'} z_{k}}) - (\sum_{k \in R (t_{(j)})} z_{s k} e^{β^{'} z_{k}}) (\sum_{k \in R (t_{(j)})} z_{t k} e^{β^{'} z_{k}})}{{(\sum_{k \in R (t_{(j)})} e^{β^{'} z_{k}})}^{2}} \end{array}$

for [latex]s = 1, \, 2, \, \ldots, \, q[/latex] and [latex]t = 1, \, 2, \, \ldots, \, q[/latex]. The elements of the observed information matrix are obtained by using the maximum likelihood estimates as arguments in the negative of this expression.

The second approach to right censoring is to write the likelihood function as the sum of all likelihoods for complete data sets that are consistent with the censoring pattern. Fortunately, this second approach yields the same likelihood function as the first approach, as illustrated by the following example.

Example 5.17 In the previous example, the data set consisted of three observed failure times: 80, 20, 50. Now, if the situation changes so that the lifetime of the third light bulb is right censored at time 50, the data set is 80, 20, 50*, as is illustrated in Figure 5.21. Using the first approach to right censoring, the observed rank vector is now [latex]{\boldsymbol r} = (2, \, 1) ^ \prime[/latex], and the likelihood function is

$\begin{array}{l} L (β_{1}) & = \prod_{j = 1}^{2} \frac{ψ (z_{r_{j}})}{\sum_{k \in R (t_{(j)})} ψ (z_{k})} \\ = \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})} \cdot \frac{ψ (z_{11})}{ψ (z_{11})} \\ = \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})} . \end{array}$

Figure 5.21: Proportional hazards model with censoring.

Long Description for Figure 5.21

The horizontal axis t lists lifetimes 20, 50, and 80 with t of 1 being 20. The vertical axis lists bulb 1 of 100 watts, bulb 2 of 100 watts, and bulb 3 of 60 watts from top to bottom. The lifetime of 100 watts bulb 1 is t 1 which is 80. The lifetime of 100 watts bulb 2 is t 2 which is 20. The lifetime of 60 watts bulb is t 3 which is 50. The t 1 and t 2 points are indicated with X marks, and the t 3 point is marked with a circle. The covariate values are z subscript 1 1 equals 1 for bulb 1, z subscript 2 1 equals 1 for bulb 2, and z subscript 3 1 equals 0 for bulb 3.

For the second approach to right censoring, there are two possibilities for the rank vector if there was no censoring: if the third bulb failed before time 80, the observed rank vector would be [latex]{\boldsymbol r} = (2, \, 3, \, 1) ^ \prime[/latex]; if the third bulb failed after time 80, the observed rank vector would be [latex]{\boldsymbol r} = (2, \, 1, \, 3) ^ \prime[/latex]. In the first case, the likelihood function would be that from the previous example:

$\begin{array}{l} \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})} \cdot \frac{ψ (z_{31})}{ψ (z_{11}) + ψ (z_{31})} \cdot \frac{ψ (z_{11})}{ψ (z_{11})} . \end{array}$

In the second case, the likelihood function would be

$\begin{array}{l} \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})} \cdot \frac{ψ (z_{11})}{ψ (z_{11}) + ψ (z_{31})} \cdot \frac{ψ (z_{31})}{ψ (z_{31})} . \end{array}$

The sum of these two likelihood functions is

$\begin{array}{l} \frac{ψ (z_{21})}{ψ (z_{11}) + ψ (z_{21}) + ψ (z_{31})}, \end{array}$

which is the same result as in the first approach to handling right censoring.

Tied lifetimes are typically handled by an approximation. When there are several failures at the same time value, each is assumed to contribute the same term to the likelihood function. Consequently, all the items with tied failure times are included in the risk set at the time of the tied observation. This approximation works well when there are not many tied observations in the data set and has been implemented in many software packages that estimate the vector of regression coefficients [latex]\boldsymbol {\beta}[/latex].

Example 5.18 Fit the Cox proportional hazards model via maximum likelihood to the remission times in the 6–MP clinical trial with a single binary covariate z₁ for the control ([latex]z_1 = 0[/latex]) and treatment ([latex]z_1 = 1[/latex]) groups. The data values are given in Example 5.6.

Using numerical methods, the maximum likelihood estimate for the single regression parameter is [latex]\hat{\beta}_1 = -1.51[/latex]. The log likelihood function attains a value of [latex]-86.38[/latex] at the maximum likelihood value, the observed information matrix has a single value 5.962, and the inverse of the observed information matrix is [latex]1/5.962 = 0.168[/latex]. This means that an asymptotic estimate of the standard deviation of the maximum likelihood estimate is

$\begin{array}{l} \sqrt{\hat{V} [{\hat{β}}_{1}]} = \sqrt{0.168} = 0.41, \end{array}$

which indicates that the maximum likelihood estimate is [latex]1.51/0.41 = 3.7[/latex] standard deviations units away from 0. It can be concluded, with a p-value less than 0.001, that the 6–MP drug is effective in increasing the remission times for leukemia patients, assuming that the proportional hazards model is appropriate here. Regardless of the baseline hazard function [latex]h_0 (t)[/latex] chosen, the hazard function in the treatment case is [latex]e^{{\hat{\beta}}_1} = e^{-1.51} = 0.221[/latex] times that of the baseline hazard function for all time values. Note that no work has been done here to assess model adequacy, and all these conclusions have been based on the fact that the proportional hazards model adequately describes the distribution of the remission time with the single binary covariate.

The R code below uses the coxph function in the survival package to compute [latex]\hat \beta_1[/latex] and a p-value for the appropriate hypothesis test.

The data set is contained in the gehan data frame in the MASS package, so the Cox proportional hazards model can also be fitted with the statements

which reverses the roles of the treatment and control groups, resulting in the reversal of the sign of [latex]\hat \beta_1[/latex].

The last example moves from the single binary covariate case to the case in which there are [latex]q > 1[/latex] covariates which can assume discrete and continuous values. The survival analysis application comes from sociology, and the analyst is attempting to determine which of the covariates significantly influences survival.

Example 5.19 The proportional hazards model has been used in diverse applications. Recidivism considers the probability that an inmate will return to prison in the future after release. Recidivism can be predicted using survival models. Several factors related to inmate background that could affect an inmate’s adjustment to society are potential screening variables. North Carolina collected recidivism data on [latex]n = 1540[/latex] prisoners in 1978. The lifetime of interest here is the time of release until the time of return to prison. Obviously, not all inmates will return to prison, so a more complicated split model, for which some of the lifetimes are assumed to be infinite, may also be used. In addition, there is significant right censoring in the data set. The purpose of the study is to assess the impact of the [latex]q = 15[/latex] covariates. The covariates [latex]z_1, \, z_2, \, \ldots, \, z_{15}[/latex] are time served, age, number of prior convictions, number of rule violations in prison, education, race, gender, alcohol problems, drug problems, marital status, probationary period, participation in a work release program, type of crime, crime against person, and crime against property. Many of these covariates are coded as indicator variables. Table 5.3 presents the estimates of the regression coefficients and their standard deviations in order of their significance. The column labeled Covariate gives a short description of the covariate considered. The next two columns give the regression coefficient estimator and an asymptotic estimate of its standard deviation. The column labeled [latex]{\hat{\beta}} \big/ {\sqrt {\hat{V} \big[ \hat{\beta} \big]}}[/latex] gives a test statistic for testing [latex]H_0 : \beta_i = 0[/latex] versus [latex]H_1 : \beta_i \not= 0[/latex], for [latex]i = 1, \, 2, \, \ldots, \, 15[/latex]. The column labeled p-value indicates the attained significance of the covariates. A value less than α = 0.05 indicates that a covariate is a statistically significant indicator of recidivism. Ten of the fifteen covariates are statistically significant. This example includes indicator variables (such as gender) and can easily be extended to include other regression modeling tools such as nonlinear and interaction terms in the regression model.

Table 5.3: North Carolina recidivism model.
Name	Covariate	[latex]\hat{\beta}[/latex]	[latex]\sqrt{\hat{V} \big[\hat{\beta}\big]}[/latex]	[latex]{{\hat{\beta}} \over \sqrt{\hat{V} \big[\hat{\beta}\big]}}[/latex]	p-value	Significant
z₂	AGE	[latex]-3.3420[/latex]	0.5195	[latex]-6.4328[/latex]	0.0000	⦁
z₃	PRIORS	0.8355	0.1371	6.0957	0.0000	⦁
z₁	TSERVD	1.1666	0.1957	5.9616	0.0000	⦁
z₆	WHITE	[latex]-0.4444[/latex]	0.0876	[latex]-5.0701[/latex]	0.0000	⦁
z₈	ALCHY	0.4285	0.1043	4.1103	0.0000	⦁
z₁₃	FELON	[latex]-0.5782[/latex]	0.1633	[latex]-3.5412[/latex]	0.0002	⦁
z₉	JUNKY	0.2819	0.0970	2.9058	0.0018	⦁
z₇	MALE	0.6745	0.2423	2.7834	0.0027	⦁
z₁₅	PROPTY	0.3894	0.1578	2.4678	0.0068	⦁
z₄	RULE	3.0788	1.6890	1.8229	0.0342	⦁
z₁₀	MARRIED	[latex]-0.1532[/latex]	0.1077	[latex]-1.4227[/latex]	0.0774
z₅	SCHOOL	[latex]-0.2507[/latex]	0.1933	[latex]-1.2966[/latex]	0.0974
z₁₂	WORKREL	0.0865	0.0902	0.9587	0.1688
z₁₄	PERSON	0.0737	0.2425	0.3039	0.3806
z₁₁	SUPER	[latex]-0.0088[/latex]	0.0966	[latex]-0.0914[/latex]	0.4636

This chapter has contained a brief introduction to some of the statistical methods that are used in survival analysis. The key modeling features that indicate the use of survival analysis are (a) a population lifetime distribution with nonnegative support, (b) appreciable dispersion, (c) possibly right-censored data values, (d) possibly a vector of covariates which might influence the lifetime distribution. The exponential, Weibull, and Cox proportional hazards model were fitted to complete and right-censored data sets in this chapter.

5.7 Exercises

5.1 Consider a large batch of light bulbs whose lifetimes are known to have exponential(1) lifetimes. Gina knows that the population distribution is exponential, but she does not know the value of the population mean. She estimates the population mean lifetime of the light bulbs by averaging n observed lifetimes from bulbs chosen at random from the batch. Find the smallest value of n that assures, with probability of at least 0.95, that the sample mean is within 0.2 of the population mean
1. exactly,
2. approximately, using the central limit theorem.
5.2 Libby is a statistician for a light bulb company. She knows that the lifetimes of the 60-watt bulbs that her company manufactures are exponentially distributed with population mean 1500 hours. She conducts a life test in which 39 of their 60-watt bulbs are placed on life test until they fail and the average of the failure times is recorded. Find the probability that the sample mean exceeds 1600 hours using
1. the central limit theorem,
2. the exact distribution of the sample mean.
5.3 Let [latex]t_1, \, t_2, \, \ldots, \, t_{n}[/latex] be a random sample from an exponential[latex](\lambda)[/latex] population, where λ is a positive unknown failure rate parameter. Find an unbiasing constant c_n so that [latex]c_n t_{(1)}[/latex] is an unbiased estimator of [latex]1 / \lambda[/latex], where [latex]t_{(1)} = \min \left\{ t_1, \, t_2, \, \ldots, \, t_n \right\}[/latex] is the first order statistic. Hint: the unbiasing constant c_n is a function of the number of items on test n.
5.4 Debbie purchases a laptop computer with a random lifetime T whose probability distribution is a special case of the log logistic distribution with survivor function

$\begin{array}{l} S (t) = \frac{1}{1 + λ t} t > 0, \end{array}$

where λ is a positive unknown scale parameter. From just a single observation of the lifetime of her laptop computer, find an exact two-sided 90% confidence interval for λ.
5.5 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from an exponential population with mean θ, where θ is a positive unknown parameter. An exact two-sided 90% confidence interval for θ is

$\begin{array}{l} 27 < θ < 55. \end{array}$

Carol is not concerned about large values of θ. Only small values of θ are of concern. What is an exact one-sided 95% confidence interval of the form [latex]\theta > k[/latex], for some constant k?
5.6 If [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are n mutually independent observations from a log normal distribution with probability density function

$\begin{array}{l} f (t) = \frac{1}{\sqrt{2 π} σ t} e^{- \frac{1}{2} {(\frac{\ln t - μ}{σ})}^{2}} t \geq 0 \end{array}$

for [latex]\sigma > 0[/latex] and [latex]-\infty < \mu < \infty[/latex], find the maximum likelihood estimators of μ and σ and exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence intervals for μ and σ in terms of [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex].
5.7 Let [latex]t_1, \, t_2, \, \ldots , \, t_7[/latex] be a random sample of the lifetimes of [latex]n = 7[/latex] items on test drawn from an exponential population with positive unknown mean θ.
1. Find an exact two-sided 90% confidence interval for the median by finding a pivotal quantity based on the sample median [latex]t_{(4)}[/latex].
2. Give an exact two-sided 90% confidence interval for the median for the [latex]n = 7[/latex] rat survival times in the treatment group from Efron and Tibshirani (1993, page 11):
  $\begin{array}{l} 16, 23, 38, 94, 99, 141, 197. \end{array}$
3. Conduct a Monte Carlo simulation experiment to provide convincing numerical evidence that the exact two-sided 90% confidence interval for the median is indeed an exact two-sided 90% confidence interval for an exponential population when θ is arbitrarily set to 1.
5.8 This chapter has emphasized confidence intervals. Another type of statistical interval is known as a prediction interval, which contains a future value of an observation with a prescribed probability. Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from an exponential population with a positive unknown mean θ. Conduct a Monte Carlo simulation experiment that provides convincing numerical evidence that the [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]t_{n+1}[/latex]

$\begin{array}{l} \frac{\bar{t}}{F_{2 n, 2, α / 2}} < t_{n + 1} < \frac{\bar{t}}{F_{2 n, 2, 1 - α / 2}} \end{array}$

is an exact prediction interval for the arbitrary parameter settings [latex]n = 11[/latex], [latex]\alpha = 0.05[/latex], and [latex]\theta = 19[/latex].
5.9 Let [latex]T_1, \, T_2, \, T_3[/latex] be mutually independent random variables such that T_i is exponentially distributed with mean [latex]i \kern 0.02em \theta[/latex], for [latex]i = 1, \, 2, \, 3[/latex], where θ is a positive unknown parameter.
1. Find the maximum likelihood estimator [latex]\hat \theta[/latex].
2. Find the probability density function of the maximum likelihood estimator [latex]\hat \theta[/latex].
3. Is [latex]\hat \theta[/latex] an unbiased estimator of θ?
4. Find an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for θ.
5. Perform a Monte Carlo simulation experiment to evaluate the coverage of the confidence interval for [latex]\theta = 10[/latex] and [latex]\alpha = 0.1[/latex].
5.10 Let [latex]t_1, \, t_2, \, \ldots , \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \frac{θ}{t^{θ + 1}} t \geq 1, \end{array}$

where θ is a positive unknown parameter.
1. Find the maximum likelihood estimator of θ.
2. Use the invariance property to find the maximum likelihood estimator of the median of the distribution.
5.11 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \sqrt{\frac{λ}{2 π t^{3}}} e^{- λ (t - 1)^{2} / (2 t)} t > 0, \end{array}$

where λ is a positive unknown parameter. This distribution is known as the standard Wald distribution which is a special case of the inverse Gaussian distribution Find the maximum likelihood estimator of λ.
5.12 Let [latex]T_1, \, T_2, \, \ldots , \, T_n[/latex] be mutually independent and identically distributed random variables from a population having probability density function

$\begin{array}{l} f (t) = 7 e^{- 7 (t - θ)} t \geq θ . \end{array}$

Find the limiting distribution of [latex]n \left( T_{(1)} - \theta \right)[/latex]. Support this limiting distribution by conducting a Monte Carlo simulation experiment.
5.13 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \frac{θ}{(1 + t)^{θ + 1}} t \geq 0, \end{array}$

where θ is a positive unknown parameter. Calculate an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for θ based on the asymptotic normality of the maximum likelihood estimator.
5.14 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \sqrt{\frac{1}{2 π t^{3}}} e^{- (t - θ)^{2} / (2 t θ^{2})} t > 0, \end{array}$

where θ is a positive unknown parameter. This population distribution is a special case of the inverse Gaussian distribution. Calculate an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for θ based on the asymptotic normality of the maximum likelihood estimator. Hint: the expected value of T is [latex]E[T] = \theta[/latex].
5.15 Let [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] be a random sample from a population with probability density function

$\begin{array}{l} f (t) = \frac{θ}{(1 + θ t)^{2}} t \geq 0, \end{array}$

where θ is a positive unknown parameter. This is a special case of the log logistic distribution.
1. Find the maximum likelihood estimator of θ. Hint: The maximum likelihood estimator cannot be expressed in closed form.
2. Find the maximum likelihood estimate of θ for the [latex]n = 7[/latex] rat survival times (in days) of the treatment group from Efron and Tibshirani (1993, page 11):
  $\begin{array}{l} 16, 23, 38, 94, 99, 141, 197. \end{array}$
3. Find an asymptotically exact two-sided 95% confidence interval for θ based on the likelihood ratio statistic for the rat survival times from part (b).
5.16 If n items from an exponential population with failure rate λ are placed on a life test that is terminated after r failures have occurred, show that

$\begin{array}{l} V [\sum_{i = 1}^{n} x_{i}] = \frac{r}{λ^{2}}, \end{array}$

where [latex]x_i = \min \{t_i, \, c_i \}[/latex], t_i is the time to failure of the ith item, and c_i is the right-censoring time for the ith item, [latex]i = 1, \, 2, \, \ldots, \, n[/latex].
5.17 If n items from an exponential population with failure rate λ are placed on a life test that is terminated after r failures have occurred, find the expected time to complete the test if
1. failed items are not replaced,
2. failed items are immediately replaced with new items.
5.18 Find the score, maximum likelihood estimator, and Fisher information matrix for a Type II censored random sample from a population with

$\begin{array}{l} f (t) = \frac{θ}{t^{θ + 1}} t \geq 1, \end{array}$

where θ is a positive unknown parameter.
5.19 The lifetimes of studio light bulbs, measured in days, is exponentially distributed with an unknown failure rate λ. James places n studio light bulbs on test at noon on one day and subsequently checks for failed bulbs at noon on subsequent days until all bulbs have failed. Let [latex]r_1, \, r_2, \, \ldots, \, r_k[/latex] be the number of observed bulb failures, some of which may be zero, on the k days that the bulbs are inspected. Find the maximum likelihood estimator for λ. Also, give the maximum likelihood estimate for the data values [latex]r_1 = 8[/latex], [latex]r_2 = 5[/latex], [latex]r_3 = 2[/latex], [latex]r_5 = 1[/latex], and all other r_i values equal zero.
5.20 James’s friend Alexandra decides to simplify matters from the previous question by assuming that all failures that occur during any interval occur at midnight. What is Alexandra’s maximum likelihood estimator for λ as a function of n and [latex]r_1, \, r_2, \, \ldots, \, r_k[/latex]?
5.21 Dre conducts a life test on n items from an exponential population with mean θ. He observes only the value of a single order statistic [latex]t_{(k)}[/latex], where k is known. So [latex]k - 1[/latex] lifetimes are left censored at [latex]t_{(k)}[/latex], one lifetime is observed at [latex]t_{(k)}[/latex], and [latex]n - k[/latex] lifetimes are right censored at [latex]t_{(k)}[/latex].
1. What is the score statistic for estimating θ?
2. What is the maximum likelihood estimator for θ when [latex]n = 30[/latex], [latex]k = 11[/latex], and [latex]t_{(11)} = 15.5[/latex]?
5.22 Consider a Type II right-censored life test with n items on test and [latex]r = 1[/latex] failure is observed at time [latex]t_{(1)}[/latex]. Assume that the items placed on the life test have lifetimes that are well described by a Rayleigh(λ) population.
1. What is the maximum likelihood estimator for λ?
2. What is an exact confidence interval for λ?
3. What is the expected width of the confidence interval from part (b)?
4. Verify the coverage and expected width of the exact confidence interval for [latex]\lambda = 2[/latex], [latex]n = 7[/latex], and [latex]\alpha = 0.05[/latex] via Monte Carlo simulation.
5.23 A randomly right-censored data set is collected from a population with hazard function

$\begin{array}{l} h (t) = θ (1 + t) t \geq 0, \end{array}$

where θ is a positive parameter.
1. Find the maximum likelihood estimator [latex]\hat \theta[/latex].
2. Give an expression for the observed information matrix.
3. Give an asymptotically exact confidence interval for θ based on the observed information matrix.
5.24 Candice conducts a life test in which n items are simultaneously placed on test at time 0. The test is concluded at time [latex]c > 0[/latex]. Assuming that the lifetimes of the items are from an exponential population with mean θ, find the distribution of the number of failures that occur by time c.
5.25 Show that when a random sample is drawn an exponential(λ) population with Type II right censoring

$\begin{array}{l} \frac{2 r λ}{\hat{λ}} \sim χ^{2} (2 r), \end{array}$

where [latex]\chi^2 (2r)[/latex] is the chi-square distribution with 2r degrees of freedom.
5.26 Consider a Type II right censored sample of n items on test and r observed failures drawn from an exponential population with mean θ. Show that the maximum likelihood estimate [latex]\hat{\theta}[/latex] is unbiased.
5.27 Assume that a life test without replacement is conducted on n items from an exponential population with failure rate λ. The exact failure times are not known, but the test is terminated upon the rth ordered failure at time [latex]t_{(r)}[/latex]. Find a point estimator for λ.
5.28 Consider a population of items with exponential(λ) lifetimes. A life test with replacement is terminated when r failures occur or at time c, whichever occurs first. This is a combination of Type I and Type II right censoring. Find the expected number of items that fail during the test as a function of λ.
5.29 For a life test of n items with exponential(λ) lifetimes (items are not replaced upon failure) which is continued until all items fail, show that

$\begin{array}{l} E [\hat{λ}] = \frac{n}{n - 1} λ, \end{array}$

where λ is the population failure rate and [latex]\hat{\lambda}[/latex] is the maximum likelihood estimator for λ. Thus, an unbiasing constant for [latex]\hat{\lambda}[/latex] is [latex]u_n = {(n - 1)} / n[/latex]. Equivalently,

$\begin{array}{l} E [\frac{n - 1}{n} \hat{λ}] = E [u_{n} \hat{λ}] = λ . \end{array}$

Find an unbiasing constant for the case of Type II right censoring.
5.30 Give a point and 95% interval estimator for the median lifetime of the 6–MP treatment group assuming that the data have been drawn from an exponential(λ) population.
5.31 Consider the following Type II right censored data set for the lifetime of a product ([latex]n = 5[/latex] and [latex]r = 3[/latex]) drawn from an exponential population with failure rate λ:

$\begin{array}{l} 3.6 3.9 8.5 . \end{array}$
1. Find the maximum likelihood estimator for the mean of the population.
2. Find the maximum likelihood estimator for [latex]S(5)[/latex].
3. Find an exact two-sided 80% confidence interval for [latex]E\left[T ^ 3 \right][/latex].
4. Find an exact one-sided 95% lower confidence interval for [latex]S(5)[/latex].
5. Find the p-value for the test [latex]H_0: \lambda = 0.04[/latex] versus [latex]H_1: \lambda > 0.04[/latex].
6. Find the value of the log likelihood function at the maximum likelihood estimate.
7. Find the value of the observed information matrix.
8. Assume the data values
  $\begin{array}{l} 3.8 4.6 6.0 9.6 \end{array}$
  
  constitute a complete data set for a different product. Find an exact two-sided 90% confidence interval for the ratio of the failure rates of the two products if both are assumed to come from exponential populations.
5.32 Sara observes a single observed lifetime T from an exponential(λ) population, where λ is a positive unknown rate parameter. Find an exact two-sided 95% confidence interval for λ.
5.33 Justin places a single item is placed on test ([latex]n = 1[/latex]). The only information that is available is that the item failed between times a and b, where [latex]a < b[/latex]. In other words, the single item’s lifetime is interval censored. Assuming that the population time to failure is exponential(λ), what is the maximum likelihood estimator of λ?
5.34 Natalie conducts a life test with [latex]n = 19[/latex] items on test and random right censoring. Let [latex]t_1, \, t_2, \, \ldots , \, t_{19}[/latex] be the independent exponential(2) times to failure. Let [latex]c_1, \, c_2, \, \ldots , \, c_{19}[/latex] be the independent exponential(1) censoring times, which are independent of the times to failure. Use Monte Carlo simulation to estimate the actual coverage of the following approximate confidence interval procedures for the population failure rate λ at for [latex]\alpha = 0.05[/latex].
1. The confidence interval consisting of all λ satisfying
  $\begin{array}{l} \frac{\hat{λ} χ_{2 r, 1 - α / 2}^{2}}{2 r} < λ < \frac{\hat{λ} χ_{2 r, α / 2}^{2}}{2 r} . \end{array}$
2. The confidence interval consisting of all λ satisfying
  $\begin{array}{l} \hat{λ} - z_{α / 2} O (\hat{λ})^{- 1 / 2} < λ < \hat{λ} + z_{α / 2} O (\hat{λ})^{- 1 / 2} . \end{array}$
3. The confidence interval consisting of all λ satisfying
  $\begin{array}{l} 2 [\ln L (\hat{λ}) - \ln L (λ)] < χ_{1, α}^{2} . \end{array}$
Replicate the experiment so as to estimate the actual coverages to three digits of accuracy.
5.35 Sixty-watt light bulb lifetimes are known to be exponentially distributed with unknown positive population mean θ from previous test results. The company that produces these light bulbs would like to estimate θ by testing n bulbs to failure at one facility and m bulbs to failure at a second facility. Let [latex]X_1, \, X_2, \, \ldots , \, X_n[/latex] be the independent lifetimes of the bulbs tested at the first facility; let [latex]Y_1, \, Y_2, \, \ldots , \, Y_m[/latex] be the independent lifetimes of the bulbs tested at the second facility. An unbiased estimate of θ is the convex combination

$\begin{array}{l} \hat{θ} = p {\hat{θ}}_{X} + (1 - p) {\hat{θ}}_{Y}, \end{array}$

where [latex]0 < p < 1[/latex], [latex]\hat \theta_X = \bar X[/latex] is the maximum likelihood estimator of θ for the data from the first facility, and [latex]\hat \theta_Y = \bar Y[/latex] is the maximum likelihood estimator of θ for the data from the second facility. Find the value of p that minimizes [latex]V\big[ \kern 0.01em \hat \theta \kern 0.04em \big][/latex].
5.36 Ash would like to test the hypothesis

$\begin{array}{l} H_{0} : λ = 17 \end{array}$

versus

$\begin{array}{l} H_{1} : λ > 17 \end{array}$

using a single value T from an exponential(λ) population, where λ is a positive unknown population failure rate. The null hypothesis is rejected if [latex]T < 0.01[/latex]. Find the significance level α for the test.
5.37 Let T be an observation from an exponential population with positive unknown population mean θ. This observation is used to test

$\begin{array}{l} H_{0} : θ = 6 \end{array}$

versus

$\begin{array}{l} H_{1} : θ = 2. \end{array}$
1. Find the critical value for the test for a fixed significance level α.
2. Find β for a fixed significance level α.
5.38 Paul collects a random sample [latex]t_1, \, t_2, \, \ldots , \, t_n[/latex] from an exponential population with positive unknown mean θ. Show that the sample mean, [latex]\bar{t}[/latex], and n times the first order statistic, [latex]n t_{(1)}[/latex], are both unbiased estimators of θ.
5.39 Jessica and Mary collect a random sample [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] of light bulb lifetimes drawn from an exponential(λ) population, where λ is a positive unknown failure rate. The bulbs are stamped with “1000 hour MTTF,” indicating that the mean time to failure equals 1000 hours. They would like to determine whether there is statistical evidence in the sample that indicates the bulbs last longer than 1000 hours.
1. State the appropriate H₀ and H₁.
2. Jessica uses the test statistic [latex]\bar{t}[/latex] and Mary uses the test statistic [latex]nt_{(1)}[/latex] to test the hypothesis. Find the critical values for their test statistics when [latex]\alpha = 0.05[/latex] and [latex]n = 10[/latex].
3. Draw the power curves associated with each of the test statistics from part (b) on the same set of axes using a computer. Again assume that [latex]\alpha = 0.05[/latex] and [latex]n = 10[/latex].
5.40 Camille observes a single lifetime T from an exponential population with a positive unknown population mean θ. She would like to test

$\begin{array}{l} H_{0} : θ = 1 \end{array}$

versus

$\begin{array}{l} H_{1} : θ > 1 \end{array}$

at [latex]\alpha = 0.07[/latex] using T as a test statistic.
1. Find the critical value c for this test.
2. Plot the power function for this test.
5.41 Ellen collects a random sample [latex]t_1, \, t_2, \, \ldots, \, t_{10}[/latex] of light bulb lifetimes from an exponential(λ) population, where λ is a positive unknown failure rate. Ellen is a reliability engineer. She is confident from previous test results that the time to failure for these light bulbs is exponentially distributed. She is interested in testing whether a manufacturer’s claim that the population mean time to failure for the bulbs is 1000 hours. So she would like to test

$\begin{array}{l} H_{0} : λ = 0.001 \end{array}$

versus

$\begin{array}{l} H_{1} : λ > 0.001 . \end{array}$

She is in a hurry. She places ten bulbs on test and only observes the first bulb fail at [latex]t_{(1)} = 14[/latex] hours, and would like to draw a conclusion at 14 hours. Give the p-value for the test based on the value of this single order statistic.
5.42 Liz collects a random sample of lifetimes [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] from an exponential(λ) distribution, where λ is a positive unknown failure rate parameter. She conducts a significance test of

$\begin{array}{l} H_{0} : λ = 1 \end{array}$

versus

$\begin{array}{l} H_{0} : λ \neq 1, \end{array}$

which achieves a p-value of [latex]p = 0.07[/latex] for a particular data set. If she then computes an exact two-sided 95% confidence interval for λ for this particular data set, will the confidence interval contain 1?
5.43 Karen fits the ball bearing data set to the Weibull distribution parameterized as

$\begin{array}{l} S (t) = e^{- (λ t)^{κ}} t \geq 0, \end{array}$

yielding maximum likelihood estimates [latex]\hat{\lambda} = 0.0122[/latex] and [latex]\hat{\kappa} = 2.10[/latex]. Ute also wants to fit the same data set to the Weibull distribution, but she uses the parameterization

$\begin{array}{l} S (t) = e^{- ρ t^{β}} t \geq 0. \end{array}$

What will be the maximum likelihood estimates [latex]\hat{\rho}[/latex] and [latex]\hat{\beta}[/latex] that Ute obtains for the ball bearing data set?
5.44 Jay conducts a life test with [latex]n = 5[/latex] items on test which is terminated when [latex]r = 3[/latex] items have failed. Failed items are not replaced in this traditional Type II right-censored data set. Assuming that the time to failure of an item in the population has a Weibull([latex]\lambda, \, \kappa[/latex]) distribution with known, positive parameters λ and [latex]\kappa[/latex], what is the probability density function of the time to complete the life test?
5.45 Jennie collects a random sample [latex]t_1, \, t_2, \, \ldots, \, t_7[/latex] from a Rayleigh population with probability density function

$\begin{array}{l} f (t) = 2 θ^{- 2} t e^{- (t / θ)^{2}} t > 0, \end{array}$

where θ is a positive unknown parameter. She would like to test

$\begin{array}{l} H_{0} : θ = 10 \end{array}$

versus

$\begin{array}{l} H_{1} : θ > 10 \end{array}$

using the test statistic [latex]t_{(1)} = \min \left\{ t_1, \, t_2, \, \ldots, \, t_7 \right\}[/latex], which assumes the value [latex]t_{(1)} = 6[/latex]. Find the p-value for her test.
5.46 Mildred collects a random sample [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] from a Rayleigh[latex](\lambda)[/latex] population with survivor function

$\begin{array}{l} S (t) = e^{- (λ t)^{2}} t > 0, \end{array}$

where λ is a positive unknown parameter.
1. Find the maximum likelihood estimator of λ.
2. Show that the log likelihood function is maximized at the maximum likelihood estimator [latex]\hat{\lambda}[/latex].
3. Given that the expected value of T is [latex]E[T] = {\sqrt{\pi}} / (2 \lambda)[/latex], find the method of moments estimator of λ.
5.47 Find the elements of the score vector for the log logistic distribution for a randomly right-censored data set.
5.48 Bryan places n items on test and observes r failures. Assuming that the failure times of the items follow the log logistic distribution and censoring is random, set up an expression for the boundary of a 95% confidence region for the shape parameter [latex]\kappa[/latex] and scale parameter λ of the log logistic distribution based on the likelihood ratio statistic. Assume that the survivor function for the log logistic distribution is

$\begin{array}{l} S (t) = \frac{1}{1 + (λ t)^{κ}} t \geq 0, \end{array}$

for [latex]\lambda > 0[/latex] and [latex]\kappa > 0[/latex]. It is not necessary to solve for the maximum likelihood estimators.
5.49 Consider a proportional hazards model with [latex]n = 3[/latex] items on test and distinct failure times [latex]t_1, \, t_2, \, t_3[/latex]. Compute the joint probability mass function values for the [latex]3! = 6[/latex] possible rank vectors, and show that they sum to 1.
5.50 Give the equations that must be solved in order to find the maximum likelihood estimators [latex]\hat{\lambda}[/latex], [latex]\hat{\kappa}[/latex], and [latex]\hat {\boldsymbol beta}[/latex] for a proportional hazards model with log logistic baseline distribution and log linear link function. A random right-censoring scheme is used.
5.51 Joyce fits the Cox proportional hazards model with unknown baseline distribution given in Examples 5.14, 5.15, and 5.16 to the [latex]n = 3[/latex] light bulb failure times. The purpose of the study was to determine the effect of wattage on survival for 60-watt and 100-watt light bulbs.
1. What is the value of the regression coefficient for wattage if it were coded as [latex]z = 60[/latex] and [latex]z = 100[/latex] rather than as a binary covariate?
2. Write a short paragraph indicating whether or not these two approaches are fundamentally equivalent ways of coding the covariate. If they differ, is one method of coding the covariate superior to the other for the purpose of the study?

5.52 Survival times (in weeks) for two groups of leukemia patients (AG positive and AG negative blood types), along with an additional covariate, white blood cell count are given in Feigl, P. and Zelen, M., “Estimation of Exponential Survival Probabilities with Concomitant Information,” Biometrics, Vol. 21, No. 4, pp. 826–838, 1965, and are displayed below.

AG positive group		AG negative group
Survival time	White blood count	Survival time	White blood count
65	2300	56	4400
156	750	65	3000
100	4300	17	4000
134	2600	7	1500
16	6000	16	9000
108	10500	22	5300
121	10000	3	10000
4	17000	4	19000
39	5400	2	27000
143	7000	3	28000
56	9400	8	31000
26	32000	4	26000
22	35000	3	21000
1	100000	30	79000
1	100000	4	100000
5	52000	43	100000
65	100000

Fit the Cox proportional hazards model to the survival times. Code the blood type as the indicator variable z₁, using 1 for AG positive and 0 for AG negative. The second covariate z₂ is the natural logarithm of the white blood cell counts minus the sample mean of the natural logarithms of the white blood cell counts. Include the interaction term [latex](z_1 - \bar z_1) z_2[/latex] in the model. Use the Breslow method for handling tied survival times.
Write a sentence interpreting the sign of [latex]\hat \beta_1[/latex], [latex]\hat \beta_2[/latex], and [latex]\hat \beta_3[/latex] in terms of risk to the patient.
Give a 95% confidence interval for β₁.
If covariates associated with p-values that are less than 0.10 are considered statistically significant, what is the fitted hazard function for a leukemia patient with baseline hazard function [latex]h_0(t)[/latex], white blood cell count 9000 who has AG positive blood type? Hint: The sample mean of the natural logarithms of the white blood cell types is 9.52 and the mean of the blood types coded as an indicator variable is [latex]17 / 33 = 0.515[/latex].

5.53 Consider the Cox proportional hazards model with a single ([latex]q = 1[/latex]) binary covariate z₁, an exponential(λ) baseline distribution, and a log linear link function. The baseline distribution can be absorbed into the link function by creating an artificial covariate [latex]z_0 = 1[/latex] and setting [latex]\lambda = e ^ {\beta_0 z_0}[/latex].
1. For a randomly right-censored data set, find the score vector.
2. For a randomly right-censored data set, find closed-form expressions for the maximum likelihood estimators [latex]\hat \beta_0[/latex] and [latex]\hat \beta_1[/latex].
3. For the [latex]n = 3[/latex] observations given in vector form below, calculate the maximum likelihood estimates [latex]\hat \beta_0[/latex] and [latex]\hat \beta_1[/latex].
  $\begin{array}{l} x = [\begin{array}{c} 80 \\ 20 \\ 50 \end{array}] δ = [\begin{array}{c} 1 \\ 1 \\ 1 \end{array}] Z = [\begin{array}{c} 1 \\ 1 \\ 0 \end{array}] . \end{array}$
4. What is the hazard function of the fitted model for the data from part (c)?
5. Use the observed information matrix to give approximate two-sided 95% confidence intervals for β₀ and β₁ for the data from part (c).
6. Give the p-values for testing the hypotheses
  $\begin{array}{l} H_{0} : & β_{i} = 0 \\ H_{1} : & β_{i} \neq 0 \end{array}$
  
  for [latex]i = 0, \, 1[/latex], for the data from part (c).
5.54 The wattage of the [latex]n = 3[/latex] light bulbs in Example 5.16 was coded as the covariate [latex]z_1 = 0[/latex] for a 60-watt bulb and [latex]z_1 = 1[/latex] for a 100-watt bulb. When the Cox proportional hazards model with an unspecified baseline hazard function was fit to the data set, the point estimate for the regression parameter was [latex]\hat \beta_1 = -0.347[/latex]. Without doing the derivation from scratch, what is the point estimate for the regression parameter if the wattage (that is, 60 watts or 100 watts) of the bulb were used as the covariate.
5.55 Mark fits a Cox proportional hazards model with unknown baseline distribution to a data set of drill bit failure times (measured in number of items drilled) with [latex]q = 2[/latex], for which the covariates denote the turning speed (revolutions per minute, rpm) and the hardness of the material (Brinell hardness number, BHN) being drilled. The turning speeds range from 2400 to 4800 rpm and the hardness of the materials ranges from 250 to 440 BHN. Interactions are not considered and the variables are not centered. The fitted model has estimated regression vector [latex]\hat {\boldsymbol \beta} = (0.014, \, 0.45) ^ \prime[/latex], and the inverse of the observed information matrix is

$\begin{array}{l} O^{- 1} (\hat{β}) = [\begin{array}{c} 0.000081 & 0.000016 \\ 0.000016 & 0.010000 \end{array}] . \end{array}$

Write a paragraph interpreting these results.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Statistical Modeling: Regression, Survival Analysis, and Time Series Analysis Copyright © 2023 by Lawrence M. Leemis is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Chapter 5 Statistical Methods in Survival Analysis

5.1 Likelihood Theory

5.2 Asymptotic Properties

5.3 Censoring

5.4 Exponential Distribution

5.4.1 Complete Data Sets

5.4.2 Type II Censored Data Sets

5.4.3 Type I Censored Data Sets

5.4.4 Randomly Censored Data Sets

5.5 Weibull Distribution

5.6 Proportional Hazards Model

5.6.1 Known Baseline Distribution

5.6.2 Unknown Baseline Distribution

5.7 Exercises

License

Share This Book