LaTeX and MathML
Pressbooks gives us options for rendering mathematical notation, including MathJax and WP QuickLaTeX. We can even work directly from your LaTex files.
About this Page
Here are two excerpts from Statistical Modeling: Regression, Survival Analysis, and Time Series Analysis, by Lawrence M. Leemis, published by W&M Press. We worked from the PDF and LaTeX forms of his manuscript and used MathML and LaTeX to render the mathematical notation. Users can right-click on the math to magnify or adjust the display and to access options for screen readers and other assistive technologies. This approach is supported across browsers, operating systems, and screen readers.
5.1 Likelihood Theory
There are always merits in obtaining raw data (that is, exact individual failure times), as opposed to grouped data (counts of the number of failures over prescribed time intervals). Given raw data, we can always construct grouped data, but the converse is typically not true; therefore, we limit discussion in this chapter to the raw data case.
The random variable T has denoted a random lifetime in previous chapter. So it is natural to use [latex]T_1, \, T_2, \, \ldots, \, T_n[/latex] to denote a random sample of n such lifetimes, where n is the number of items on test. When specific values are given for realizations of such lifetimes, which is typically the case from this point forward, they are denoted by [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex]. In other words, [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are the experimental values of the mutually independent and identically distributed random variables [latex]T_1, \, T_2, \, \ldots, \, T_n[/latex]. The associated ordered observations, or order statistics, are denoted by [latex]t_{(1)}, \, t_{(2)}, \, \ldots, \, t_{(n)}[/latex].
The Greek letter θ is often used to denote a generic unknown parameter. We will refer to [latex]\hat \theta[/latex] in the abstract as a point estimator; when [latex]\hat \theta[/latex] assumes a specific numeric value, it will be referred to as a point estimate. The probability distribution of a statistic is referred to as a sampling distribution.
Assume that there is a single unknown parameter θ in the probability model for T. Assume further that the data values [latex]t_1, \, t_2, \, \ldots, \, t_n[/latex] are mutually independent and identically distributed random variables. The joint probability density function of the data values is the product of the marginal probability density functions of the individual observations:
This function is the likelihood function. In order to simplify the notation, the likelihood function is often written as simply
The maximum likelihood estimator of θ, which is denoted by [latex]\hat \theta[/latex], is the value of θ that maximizes [latex]L(\theta)[/latex].
The next example reviews the associated notions of the log likelihood function, score vector, maximum likelihood estimator, Fisher information matrix, and observed information matrix for a two-parameter lifetime model. We assume for now that there are no censored observations in the data set; all of the failure times are observed.
In some cases, it is possible to find the exact distribution of a pivotal quantity which results in exact statistical inference (that is, constructing exact confidence intervals and performing exact hypothesis tests). It is more often the case that exact statistical inference is not possible, and asymptotic properties associated with the likelihood function must be relied on for approximate inference. The next section reviews some asymptotic properties that arise in likelihood theory. When a large data set of lifetimes is available, these properties often lead to approximate statistical methods of inference.
The product–limit survivor function estimate for all t values is plotted in Figure 6.3. Downward steps occur at the [latex]k = 7[/latex] observed failure times. Some software packages place a vertical hash mark on the Kaplan–Meier estimate to highlight censored values that occur between observed failure times; these occur at times 9, 11, 17, 19, 20, 25, 32, and 34 in Figure 6.3. The effect of censored observations in the survivor function estimate is a larger downward step at the next subsequent observed failure time. If there is a tie between an observed failure time and censoring time (as there is at time 6 in this example) the standard convention of including the censored value(s) in the risk set when computing the number of items at risk means that there will be a larger downward step in the survivor function estimate following the tied value. Since the last observed data value, 35*, corresponds to a right-censored observation, the survivor function estimate is truncated at time 35 and is assumed to be undefined for [latex]t > 35[/latex].

Product–limit survivor function estimate for the 6–MP treatment group.
Long Description for Figure 6.3
The horizontal axis t ranges from 0 to 35 in increments of 5 units. The vertical axis S of t ranges from 0.0 to 1.0 in increments of 0.2 units. The downward step function with 8 steps decreases from 1.0 to 0.4, whilst the t increases from 0 to 35. The first step is from t equals 0 to 5, and the last step is from t equals 23 to 35, which are the longest steps. The vertical hash marks are placed on the Kaplan Meier estimate at the values of t equals 9, 11, 17, 19, 20, 25, 32, and 34.
The R code to generate this plot uses the survfit function from the survival package. The failure and censoring times [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] are held in the vector named time. The indicator variables [latex]\delta_1, \, \delta_2, \, \ldots, \, \delta_n[/latex] are held in the vector named status. The Surv function creates a survival object, which is used in the left-hand side of the formula argument passed to survfit. The right-hand side of the formula argument to survfit contains just 1 to indicate that there are no covariates being considered when computing the product–limit estimator for just the remission times in the treatment group. The summary function reveals the calculations used in estimating the product–limit estimate and the plot function generates a graph of the product–limit estimate, which is given in Figure 6.3.
library(survival)
time = c(6, 6, 6, 6, 7, 9, 10, 10, 11, 13, 16, 17, 19, 20, 22,
23, 25, 32, 32, 34, 35)
status = c(1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0)
kmest = survfit(Surv(time, status) ~ 1, conf.type = "none")
summary(kmest)
plot(kmest)
There is a second and perhaps more intuitive way of deriving the product–limit estimator, often referred to as the “redistribute-to-the-right” algorithm. This technique begins by defining an initial probability mass function that apportions equal probability to each of the n data values. In subsequent passes through the data, this probability mass function estimate is modified as the probability is redistributed to the right, with special treatment given to right-censored observations. The algorithm is illustrated next on the 6–MP treatment group data set from Example 5.6.