Chapter 9: Topics in Time Series Analysis

Lawrence M. Leemis

Chapter 9 Topics in Time Series Analysis

This chapter presents several topics in time series analysis. These include several of the popular time series models which are special cases of the ARMA(p, q), including the software required for fitting these models. The first section surveys the probability models and statistical methods associated with autoregressive models, more specifically the AR(1), AR(2), and AR(p) models. The second section surveys the probability models and statistical methods associated with autoregressive models, more specifically the MA(1), MA(2), and MA(q) models. It is important to know the properties of these special cases of the ARMA(p, q) model in order to successfully fit such a model to a realization of a time series. This will allow us to build an inventory of population autocorrelation and partial autocorrelation functions for these models that can be matched to their statistical counterparts for building a time series model. Time series analysts tend to use the smallest possible p and q values that adequately describe a time series. For this reason, separate subsections are devoted to the AR(1), AR(2), MA(1), and MA(2) time series models.

9.1 Autoregressive Models

Autoregressive models for a time series [latex]\left\{ X_t \right\}[/latex] will be considered in this section. An autoregressive model of order p is a special case of an ARMA(p, q) model with no moving average terms (that is, [latex]q = 0[/latex]), specified as

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + \dots + ϕ_{p} X_{t - p} + Z_{t}, \end{array}$

where [latex]\phi_1, \, \phi_2 , \, \ldots , \, \phi_p[/latex] are real-valued parameters and [latex]\left\{ Z_t \right\}[/latex] is a time series of white noise with population mean zero and population variance [latex]\sigma _ Z ^ {\, 2}[/latex]. The formulation of the AR(p) time series model looks quite similar to that of a multiple linear regression model with p independent variables. These independent variables are also known as predictors, regressors, or covariates in regression analysis. That is the genesis of the term autoregressive to describe this model. The prefix auto means self, indicating that this model has the current value of the time series [latex]\left\{ X_t \right\}[/latex] written as a linear function of the p previous versions of itself plus a white noise term Z_t. The white noise term is critical to the model because without it, there would be no randomness in the model.

Rather than diving right into an AR(p) model, we first introduce the AR(1) and AR(2) models in separate sections because the mathematics are somewhat easier than the general case and some important geometry and intuition can be developed in these restricted models. In addition, an AR(1) or AR(2) model is often an adequate time series model in a particular setting. We always want a model with the fewest possible number of parameters that adequately approximates the underlying time series probability model. In the sections that follow, we will

define the time series model for [latex]\left\{ X_t \right\}[/latex],
determine the values of the parameters associated with a stationary model,
derive the population autocorrelation and partial autocorrelation functions,
develop algorithms for simulating observations from the time series,
inspect simulated realizations to establish patterns,
estimate the parameters from a time series realization [latex]\left\{ x_t \right\}[/latex],
assess the adequacy of the time series model, and
forecast future values of the time series using both point and interval estimates.

The purpose of deriving the population autocorrelation and partial autocorrelation functions is to build an inventory of shapes and patterns for these functions that can be used to identify tentative time series models from their sample counterparts by making a visual comparison between population and sample versions. This inventory of shapes and patterns plays an analogous role to knowing the shapes of various probability density functions (for example, the bell-shaped normal probability density function or the rectangular-shaped uniform distribution) in the analysis of univariate data in which the shape of the histogram is visually compared to the inventory of probability density function shapes.

In each section that follows, a single example of a time series will be carried through the various statistical procedures given in the list above. Stationarity plays a critical role in time series analysis because we are not able to forecast future values of the time series without knowing that the probability model is stable over time. This is why the visual assessment of a plot of the time series is always a critical first step in the analysis of a time series.

9.1.1 The AR(1) Model

The autoregressive model of order 1 is defined next. It has a closed-form expression for the population autocorrelation function and is frequently used in applications.

No subscript is necessary on the [latex]\phi[/latex] parameter because there is only one [latex]\phi[/latex] parameter in the AR(1) model. So there are two parameters that define an AR(1) model: the coefficient [latex]\phi[/latex] and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex].

The current value in the time series, X_t, is given by the parameter [latex]\phi[/latex] multiplied by the previous observed value in the time series, [latex]\phi X_{t - 1}[/latex], plus the current white noise term Z_t. This model has the form of a simple linear regression model forced through the origin in which X_t is being predicted by the previous value of the time series [latex]X_{t-1}[/latex]. The parameter [latex]\phi[/latex] plays the role of the slope of the regression line. Thinking about an AR(1) model as a simple linear regression model suggests a statistical graphic that can be helpful in determining whether it is an appropriate model for a particular time series. A plot of x_t on the vertical axis against [latex]x_{t - 1}[/latex] on the horizontal axis should be approximately linear if the AR(1) model is appropriate. The slope of the regression line on this plot corresponds to [latex]\phi[/latex], and the magnitude of the variability of the points about the regression line is determined by the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex].

Some authors prefer to parameterize the AR(1) model as

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{0} Z_{t}, \end{array}$

where [latex]\phi_0[/latex] and [latex]\phi_1[/latex] are real-valued parameters. We avoid this parameterization because the [latex]\phi_0[/latex] parameter is redundant in the sense that the population variance of the white noise [latex]\sigma _Z ^ {\, 2}[/latex] is absorbed into the [latex]\phi_0[/latex] parameter. Also, some authors use a – rather than a + between the terms on the right-hand side of the model.

To illustrate the thinking behind the AR(1) model in a specific context, let X_t represent the closing price of a particular stock on day t. The AR(1) model indicates that today’s closing price, denoted by X_t, equals [latex]\phi[/latex] multiplied by yesterday’s closing price ([latex]\phi X_{t - 1}[/latex]), plus today’s random white noise term Z_t.

Stationarity

One initial important question concerning the AR(1) model is whether or not the model is stationary. Consider a thought experiment that determines whether an AR(1) model is stationary for specific values of [latex]\phi[/latex]. For one particular instance, consider [latex]\phi = 0[/latex]. In this case the AR(1) time series model reduces to

$\begin{array}{l} X_{t} = Z_{t}, \end{array}$

which is a time series model consisting solely of white noise. We know from Example 7.15 that a time series model of white noise terms is stationary. Now consider another instance, [latex]\phi = 1[/latex]. In this case the AR(1) time series model reduces to

$\begin{array}{l} X_{t} = X_{t - 1} + Z_{t}, \end{array}$

which indicates that each value in the time series is the previous value plus the current white noise. In this case the population variance of the process is increasing with time because the number of white noise terms accumulate over time (see Example 7.8), so the AR(1) model with [latex]\phi = 1[/latex] violates one of the stationarity conditions given in Definition 7.6. The AR(1) model with [latex]\phi = 1[/latex] can be recognized as a random walk model from Example 7.4, and it was determined to be nonstationary in Example 7.17. So we have established that the AR(1) time series model is stationary for [latex]\phi = 0[/latex] and nonstationary for [latex]\phi = 1[/latex]. We now try to determine general restrictions on [latex]\phi[/latex] associated with a stationary AR(1) time series model. We take four different approaches to establishing the values of the coefficient [latex]\phi[/latex] that lead to a stationary model. The four approaches provide a review of several concepts defined previously.

Approach 1: Causality. In the derivations concerning the AR(1) time series model that follow, it will be beneficial to write the time series value X_t as a linear combination of the current and previous white noise values. This will allow us to use the definition of causality in Definition 8.2 to determine the values of [latex]\phi[/latex] associated with a stationary AR(1) model. To begin, recall that the AR(1) model given by

$\begin{array}{l} X_{t} = ϕ X_{t - 1} + Z_{t} \end{array}$

can be shifted in time and is equally valid for other t values, for example,

$\begin{array}{l} X_{t - 1} & = ϕ X_{t - 2} + Z_{t - 1} \\ X_{t - 2} & = ϕ X_{t - 3} + Z_{t - 2} \\ ⋮ \end{array}$

Using successive substitutions into the AR(1) model results in

$\begin{array}{l} X_{t} & = ϕ X_{t - 1} + Z_{t} \\ = ϕ (ϕ X_{t - 2} + Z_{t - 1}) + Z_{t} \\ = ϕ^{2} X_{t - 2} + ϕ Z_{t - 1} + Z_{t} \\ = ϕ^{2} (ϕ X_{t - 3} + Z_{t - 2}) + ϕ Z_{t - 1} + Z_{t} \\ = ϕ^{3} X_{t - 3} + ϕ^{2} Z_{t - 2} + ϕ Z_{t - 1} + Z_{t} \\ ⋮ \\ = Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + ϕ^{3} Z_{t - 3} + \dots . \end{array}$

This can be recognized as an MA(∞) time series model. Representing an AR(1) model as an MA(∞) model is known as duality. We now determine the constraints on the parameter [latex]\phi[/latex] which are required for stationarity. This is the form that is required for causality from Definition 8.2. The coefficients [latex]\psi_1, \, \psi_2, \, \ldots[/latex] for the AR(1) model from Definition 8.2 are

$\begin{array}{l} ψ_{1} = ϕ, ψ_{2} = ϕ^{2}, ψ_{3} = ϕ^{3}, \dots, \end{array}$

or in general, [latex]\psi_j = \phi ^ j[/latex], for [latex]j = 1, \, 2, \, \ldots[/latex]. Definition 8.2 requires that

$\begin{array}{l} \sum_{j = 1}^{\infty} ψ_{j}^{2} = \sum_{j = 1}^{\infty} ϕ^{2 j} = ϕ^{2} + ϕ^{4} + ϕ^{6} + \dots < \infty \end{array}$

for the time series model to be written in causal form. This summation is a geometric series that converges when [latex]|\phi| < 1[/latex], or equivalently, when [latex]-1 < \phi < 1[/latex], so these are the values of [latex]\phi[/latex] for which the AR(1) model is causal, which also implies that the model is stationary. Expressing the AR(1) model as an MA(∞) model will also be helpful in the subsequent derivation of the population autocovariance and autocorrelation functions.

Approach 2: Backshift operator. Although the purely algebraic derivation of the causal form of the AR(1) time series model using standard algebra techniques from Approach 1 works fine for establishing stationarity, there is an alternative approach which is slightly more elegant that exploits the backshift operator B. The AR(1) model

$\begin{array}{l} X_{t} = ϕ X_{t - 1} + Z_{t} \end{array}$

can be rewritten as

$\begin{array}{l} X_{t} - ϕ X_{t - 1} = Z_{t}, \end{array}$

which can be expressed using the backshift operator as

$\begin{array}{l} (1 - ϕ B) X_{t} = Z_{t} . \end{array}$

The first-order polynomial [latex]\phi(B) = 1 - \phi B[/latex] generalizes to a polynomial in B of order p for an AR(p) model. Dividing both sides of this equation by [latex]1 - \phi B[/latex] gives

$\begin{array}{l} X_{t} = \frac{Z_{t}}{1 - ϕ B} . \end{array}$

For values of [latex]\phi[/latex] satisfying [latex]-1 < \phi < 1[/latex], this can be recognized as a geometric series in B:

$\begin{array}{l} X_{t} = (1 + ϕ B + ϕ^{2} B^{2} + ϕ^{3} B^{3} + \dots) Z_{t} . \end{array}$

Executing the B operator converts this to the form

$\begin{array}{l} X_{t} = Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + ϕ^{3} Z_{t - 3} + \dots, \end{array}$

which is the same form that we encountered using the successive substitutions in the causality approach.

Approach 3: Unit roots analysis. Theorem 8.3 indicates that all AR(1) models are invertible and they are stationary when the root of

$\begin{array}{l} ϕ (B) = 1 - ϕ B = 0 \end{array}$

lies outside of the unit circle in the complex plane. The solution to this equation is

$\begin{array}{l} B = \frac{1}{ϕ} . \end{array}$

This root falls on the real axis in the complex plane and lies outside of the unit circle when

$\begin{array}{l} - 1 < ϕ < 1, \end{array}$

which is consistent with Approaches 1 and 2.

Approach 4: Definition of stationarity. We can also return to first principles to establish the values of [latex]\phi[/latex] associated with a stationary AR(1) model. This approach also results in the derivation of the population autocorrelation function. Recall from Definition 7.6 that a time series model is stationary if (a) the expected value of X_t is constant for all t, and (b) the population covariance between X_s and X_t depends only on the lag [latex]|t - s|[/latex]. Using the causal formula for the AR(1) time series model expressed as an MA(∞) time series model from Approach 1, the expected value of X_t is

$\begin{array}{l} E [X_{t}] & = E [Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + ϕ^{3} Z_{t - 3} + \dots] \\ = E [Z_{t}] + ϕ E [ϕ Z_{t - 1}] + ϕ^{2} E [Z_{t - 2}] + ϕ^{3} E [Z_{t - 3}] + \dots \\ = 0 \end{array}$

for all values of the parameters [latex]\phi[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex], and all values of t. Again using the causal formula for the AR(1) time series model expressed as an MA(∞) time series model,

$\begin{array}{l} γ (s, t) & = Cov (X_{s}, X_{t}) \\ = Cov (Z_{s} + ϕ Z_{s - 1} + ϕ^{2} Z_{s - 2} + \dots, Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + \dots) \\ = Cov (Z_{s}, ϕ^{| t - s |} Z_{s}) + Cov (ϕ Z_{s - 1}, ϕ^{| t - s | + 1} Z_{s - 1}) + Cov (ϕ^{2} Z_{s - 2}, ϕ^{| t - s | + 2} Z_{s - 2}) + \dots \\ = ϕ^{| t - s |} σ_{Z}^{2} + ϕ^{| t - s | + 2} σ_{Z}^{2} + ϕ^{| t - s | + 4} σ_{Z}^{2} + \dots \\ = (ϕ^{| t - s |} + ϕ^{| t - s | + 2} + ϕ^{| t - s | + 4} + \dots) σ_{Z}^{2} \\ = ϕ^{| t - s |} (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2} | t - s | = 0, 1, 2, \dots \end{array}$

for [latex]-1 < \phi < 1[/latex]. Since [latex]E \left[ X_t \right] = 0[/latex] for all values of t and the population autocovariance function depends only on the lag [latex]|t - s|[/latex], we conclude that the AR(1) process is stationary when [latex]-1 < \phi < 1[/latex]. So the population autocovariance function can be expressed in terms of the lag k as

$\begin{array}{l} γ (k) = (\frac{ϕ^{k}}{1 - ϕ^{2}}) σ_{Z}^{2} k = 0, 1, 2, \dots . \end{array}$

Dividing by the population autocovariance function by

$\begin{array}{l} γ (0) = (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2} \end{array}$

gives the population autocorrelation function

$\begin{array}{l} ρ (k) = \frac{γ (k)}{γ (0)} = ϕ^{k} k = 0, 1, 2, \dots . \end{array}$

Based on the four approaches, we now know beyond a shadow of doubt that an AR(1) model is stationary for values of the parameter [latex]\phi[/latex] satisfying [latex]-1 < \phi < 1[/latex]. This derivation constitutes a proof of the following result, which will be stated for just the nonnegative lags. Many authors list the lags as [latex]k = \pm 1, \, \pm 2, \, \ldots \,[/latex], but we appeal to Theorem 7.1 to cover the negative lags and only report the nonnegative lags in all of the population autocorrelation functions given in this chapter.

The derivation of [latex]\rho(k) = \phi ^ k[/latex] for [latex]k = 0, \, 1, \, 2, \, \ldots[/latex] provides still further evidence of the restriction that [latex]-1 < \phi < 1[/latex]. If [latex]\phi[/latex] were equal to a value outside of this range, say [latex]\phi = 2[/latex], this would result in population correlation values outside of the range [latex]-1 \le \rho(k) \le 1[/latex].

For all admissible values of [latex]\phi[/latex] on the interval [latex]-1 < \phi < 1[/latex], we see from the formula [latex]\rho(k) = \phi ^ k[/latex] for [latex]k = 0, \, 1, \, 2, \, \ldots[/latex] that there will be a geometric decline in the magnitude of the values in the population autocorrelation function as the lag k increases. There are two distinct cases for [latex]\phi[/latex], however, which will result in population autocorrelation functions with distinctly different shapes. The first case is [latex]0 < \phi < 1[/latex], which gives positive population autocorrelation values at all lags. This is associated with a time series that lingers on one side of the mean. How long it lingers depends on the magnitude of [latex]\phi[/latex]. Larger values of [latex]\phi[/latex] indicate that nearby observations will tend to be more likely to be on the same side of the mean, and therefore the time series will tend to linger longer on one side of the mean. The second case is [latex]-1 < \phi < 0[/latex], which gives population autocorrelation function values which alternate in sign and is associated with a time series that is likely to jump from one side of the mean to the other for adjacent observations. These two cases are illustrated in Figure 9.1 for the first 8 lags of the population autocorrelation function for [latex]\phi = 0.8[/latex] and [latex]\phi = -0.8[/latex].

Two graphs of population autocorrelation function for the first 8 lags for phi equals 0.8 and negative 0.8. — Figure 9.1: AR(1) population autocorrelation functions for [latex]\phi = 0.8[/latex] (left) and [latex]\phi = -0.8[/latex] (right).

Long Description for Figure 9.1

In both graphs, the horizontal axis k ranges from 0 to 8 in increments of 1 unit. The vertical axis rho of k ranges from negative 1 to 1 in increments of 1 unit. In the first graph for phi equals 0.8, a horizontal line is drawn at 0. The values of rho of k decrease progressively from 1 to 0.2 for k values 0 to 8. The second graph for phi equals negative 0.8 exhibits a damped sinusoidal fashion. The values of rho for even k values are positive, decreasing from 1 to 0.2. The values of rho for odd k values are negative, increasing from negative 0.8 to negative 0.2. All data are estimated.

Population Partial Autocorrelation Function

We now determine the population partial autocorrelation function for an AR(1) model. By Definition 7.4, the population lag 0 partial autocorrelation value is [latex]\rho ^ * (0) = 1[/latex]. The population lag 1 partial autocorrelation value is [latex]\rho ^ * (1) = \rho (1) = \phi[/latex]. The population lag 2 partial autocorrelation is

$\begin{array}{l} ρ^{*} (2) = \frac{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & ρ (2) \end{array} |}{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & 1 \end{array} |} = \frac{| \begin{array}{cc} 1 & ϕ \\ ϕ & ϕ^{2} \end{array} |}{| \begin{array}{cc} 1 & ϕ \\ ϕ & 1 \end{array} |} = 0. \end{array}$

This is consistent with the result from Example 7.22 from first principles. Notice that the second column of the matrix in the numerator is a multiple of the first column of the matrix in the numerator. This is why the determinant of the numerator is zero. The population lag 3 partial autocorrelation is

$\begin{array}{l} ρ^{*} (3) = \frac{| \begin{array}{ccc} 1 & ρ (1) & ρ (1) \\ ρ (1) & 1 & ρ (2) \\ ρ (2) & ρ (1) & ρ (3) \end{array} |}{| \begin{array}{ccc} 1 & ρ (1) & ρ (2) \\ ρ (1) & 1 & ρ (1) \\ ρ (2) & ρ (1) & 1 \end{array} |} = \frac{| \begin{array}{ccc} 1 & ϕ & ϕ \\ ϕ & 1 & ϕ^{2} \\ ϕ^{2} & ϕ & ϕ^{3} \end{array} |}{| \begin{array}{ccc} 1 & ϕ & ϕ^{2} \\ ϕ & 1 & ϕ \\ ϕ^{2} & ϕ & 1 \end{array} |} = 0. \end{array}$

Again, the determinant in the numerator is zero because the third column is a multiple of the first column. This pattern continues for the lag k population partial autocorrelation function, which has a first column of the numerator matrix [latex]\left[ 1, \, \phi , \, \phi ^ 2 , \, \ldots , \, \phi ^ {k - 1} \right] ^ \prime[/latex] and last column [latex]\left[ \phi, \, \phi ^ 2 , \, \phi ^ 3 , \, \ldots , \, \phi ^ k \right] ^ \prime[/latex]. Since the last column of the numerator matrix is a multiple of the first column of the numerator matrix, the determinant of the numerator matrix is zero. This constitutes a proof of the following result.

Figure 9.2 shows the first 8 lags of the population partial autocorrelation function for [latex]\phi = 0.8[/latex] and [latex]\phi = -0.8[/latex]. These are the same parameter settings as in Figure 9.1. Unlike the population autocorrelation function which tails off in magnitude for increasing lags, the population partial autocorrelation cuts off after lag 1. When plotting the corresponding sample analogs, it is typically easier to visually assess a function cutting off rather than tailing off, particularly if there is significant random sampling variability in the observed time series.

Two graphs of the population partial autocorrelation functions for the first 8 lags for phi equals 0.8 and negative 0.8. — Figure 9.2: Population partial autocorrelation functions for [latex]\phi = 0.8[/latex] (left) and [latex]\phi = -0.8[/latex] (right).

Long Description for Figure 9.2

In both graphs, the horizontal axis k ranges from 0 to 8 in increments of 1 unit. The vertical axis rho star of k ranges from negative 1 to 1 in increments of 1 unit. In the first graph for phi equals 0.8, a horizontal line is drawn at 0. The values of the rho star of k for k values 0 and 1 are 1 and 0.8, respectively. The second graph for phi equals negative 0.8 has rho star of k values 1 and negative 0.8 for k values 0 and 1, respectively. All data are estimated.

The Shifted AR(1) Model

The population mean function for the AR(1) model is [latex]E \left[ X_t \right] = 0[/latex]. This model is not of much use in practice because most real-world time series are not centered around zero. Adding a third parameter μ overcomes this shortcoming. Since population variance and covariance are unaffected by a shift, the associated population autocorrelation and partial autocorrelation functions remain the same as those given in Theorems 9.1 and 9.2. Likewise, the condition for stationarity is unchanged.

Simulation

An AR(1) time series can be simulated by appealing to the defining formula for the AR(1) model. Iteratively applying the defining formula for an AR(1) model

$\begin{array}{l} X_{t} = ϕ X_{t - 1} + Z_{t} \end{array}$

results in the simulated values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]. The difficult aspect of this algorithm is how to generate the first value X₁ because there is no X₀ available. For simplicity, assume that the white noise terms are Gaussian white noise. Since the expected value of X_t is [latex]E \left[ X_t \right] = 0[/latex], the population variance of X_t is

$\begin{array}{l} V [X_{t}] = γ (0) = (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2}, \end{array}$

and linear combinations of mutually independent normally distributed random variables are normal, then the first simulated observation

$\begin{array}{l} X_{1} \sim N (0, (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2}) . \end{array}$

The algorithm given as pseudocode below generates an initial time series observation X₁ as indicated above, and then uses an additional [latex]n - 1[/latex] Gaussian white noise terms [latex]Z_2, \, Z_3, \, \ldots, \, Z_n[/latex] to generate the remaining time series values [latex]X_2, \, X_3, \, \ldots, \, X_n[/latex] using the AR(1) defining formula from Definition 9.1. Indentation denotes nesting in the algorithm.

The three-parameter shifted AR(1) time series model which includes a population mean parameter μ can be simulated by simply adding μ to each time series observation generated by this algorithm. The next example implements this algorithm in R.

Example 9.1 Generate a realization of [latex]n = 100[/latex] observations from an AR(1) time series model with [latex]\phi = 0.8[/latex] and Gaussian white noise error terms with [latex]\sigma _Z ^ {\, 2} = 9[/latex].

Since [latex]\phi = 0.8[/latex] lies in the interval [latex]-1 < \phi < 1[/latex], this is a stationary AR(1) time series model via Theorem 9.1. The first (optional) statement in the R code below uses the set.seed function to establish the random number seed. The second statement sets the AR(1) coefficient to [latex]\phi = 0.8[/latex]. The third statement sets the standard deviation of the Gaussian white noise to [latex]\sigma _ Z = 3[/latex]. The fourth statement sets the number of simulated values to [latex]n = 100[/latex]. The fifth statement defines the vector x of length [latex]n = 100[/latex] to hold the simulated time series values. The sixth statement generates the first simulated time series observation X₁ with a call the rnorm function. Finally, the for loop iterates through the defining formula for the AR(1) model generating the remaining observations [latex]X_2, \, X_3, \, \ldots, \, X_{100}[/latex].

Using the plot.ts function to make a plot of the time series contained in x, the acf function to plot the associated correlogram, the pacf function to plot the associated sample partial autocorrelation function, and the layout function to arrange the graphs as in Example 7.24, the resulting trio of graphs are displayed in Figure 9.3. The points that have been added to the time series plot can be helpful in identifying patterns. Consistent with an AR(1) model with [latex]\phi = 0.8[/latex] the time series plot shows that the observations tend to have runs of observations that linger above and below the population mean of 0, which is indicated by a horizontal line. The associated sample autocorrelation function tails off as expected from Figure 9.1. The associated sample partial autocorrelation function has a statistically significant spike at lag 1 with [latex]r ^ * _ 1 = 0.8187[/latex], and then cuts off after lag 1 as expected from Figure 9.2. The spike at lag 1 on both autocorrelation graphs is approximately [latex]\phi = 0.8[/latex], as expected. The 95% confidence intervals indicated by the dashed lines show that the values of the sample partial autocorrelation function do not significantly differ from zero at lags beyond lag 1.

A time series plot for 100 simulated values for phi 0.8 with its associated correlogram and sample partial autocorrelation function for the first 15 lags. — Figure 9.3: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an AR(1) model.

Long Description for Figure 9.3

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from negative 10 to 10 in increments of 5 units. The graph behaves in a spike pattern. A horizontal line is drawn at x subscript t equals 0. The graph increases from negative 5 to 2 as t increases from 0 to 10. It then progressively decreases to negative 6 at t equals 13, then increases to 0 at t equals 20, and again decreases to negative 8 at t equals 26. It then increases to 8 and decreases to a minimum of negative 11 at t equals 55, again increases to 9 at t equals 85, and decreases to negative 7 at t equals 93. In the correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at negative 0.2 and 0.2. The values of r subscript k decrease from 1.0 to 0.1 for k values 0 through 10. The values then decrease from negative 0.01 to negative 0.2 as k increases from 11 to 15. For the partial autocorrelation function, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The values of r star subscript k value for k values 0, 1, 3, and 8 are 1.0, 0.7, 0.01, and 0.15, respectively. For the remaining k values, the r star subscript k values are negative, ranging between negative 0.2 and 0.0. All data are estimated.

We recommend running the simulation code from the previous example several dozen times in a loop and viewing the associated plots of x_t, r_k, and [latex]r_k^*[/latex] in search of patterns. A call to the R function Sys.sleep between the displays of the trio of plots can be used to include an artificial time delay to allow you to inspect the plots. This will allow you to see how various realizations of a simulated AR(1) time series model vary from one realization to the next. So when you then view a single realization of a real-life time series, you will have a better sense of how far these plots might deviate from their expected patterns.

There is a second way to simulate observations from an AR(1) time series. This second technique starts the time series at an initial arbitrary value, and then allows the time series to “warm up” or “burn in” for several time periods before producing the first observation X₁. A reasonable initial arbitrary value for the standard AR(1) model is 0; a reasonable initial arbitrary value for the shifted AR(1) model is μ. This is the approach taken by the built-in R function named arima.sim (for autoregressive moving average simulation), which simulates a realization of a time series. Using the arima.sim function saves a few keystrokes over the approach taken in the previous example, as illustrated next.

Example 9.2 Generate a realization of [latex]n = 100[/latex] observations from a shifted AR(1) time series model with [latex]\phi = -0.8[/latex], Gaussian white noise error terms with [latex]\sigma _Z ^ {\, 2} = 9[/latex], and mean value [latex]\mu = 10[/latex].

Since there is now a nonzero population mean value, the shifted AR(1) model is

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t}, \end{array}$

where [latex]\mu = 10[/latex], [latex]\phi = -0.8[/latex], and [latex]\sigma _ Z = 3[/latex]. Since [latex]\phi = -0.8[/latex] lies in the interval [latex]-1 < \phi < 1[/latex], this is a stationary AR(1) time series model. The model argument in the arima.sim function is a list containing the value of [latex]\phi[/latex]. Although the default probability distribution for the white noise is normal (that is, Gaussian white noise) with population variance [latex]\sigma _ Z ^ {\, 2}[/latex], the function allows for other distributions. The second argument to arima.sim is n, the number of time series observations to be generated. The sd argument defines the standard deviation of the white noise. The n.start argument gives the number of observations in the warm-up period, which we specify here as 50. The R code to generate [latex]n = 100[/latex] values from the shifted AR(1) model is given below.

Figure 9.4: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an AR(1) model.

Long Description for Figure 9.4

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from 0 to 20 in increments of 5 units. A horizontal line is drawn at x subscript t equals 10. The graph behaves in a spike pattern. Half of the values lie above the horizontal line with a maximum value of 20 at t equals 30 and a minimum value of 0 at t equals 55. Almost, the values range between positive and negative values alternately. In the correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0. A horizontal line is drawn at 0.0. The values of r subscript k alternate between positive and negative, and their magnitude decreases. In the partial autocorrelation, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The values of r subscript k for k values 0 and 1 are 1.0 and negative 0.8, respectively. The values for the remaining k values range between negative 0.2 and 0.1. All data are estimated.

Figure 9.4 shows the three plots associated with the simulated values using the plot.ts, acf, and pacf functions. The time series plot shows a radically different pattern than the time series in the previous example in two manners. First, this simulated time series is centered around [latex]\mu = 10[/latex] (indicated by a horizontal line) rather than [latex]\mu = 0[/latex]. Second, adjacent observations in the time series tend to jump from one side of the population mean to other side of the population mean, which is consistent with the population autocorrelation function from the right-hand plot in Figure 9.1. Consistent with the time series plot, the values in the sample autocorrelation function alternate in sign and decrease in magnitude. The sample partial autocorrelation function has a statistically significant spike at lag 1 of [latex]r ^ * _ 1 = -0.8330[/latex], and nonsignificant spikes thereafter. This is consistent with the right-hand plot in Figure 9.2. Type

in order to view the R source code associated with the arima.sim function. This function can simulate a realization of any ARMA(p, q) time series model, and we will use it for simulations subsequently.

The remaining topics associated with the AR(1) time series model are statistical in nature: parameter estimation, model assessment, model selection, and forecasting. A sample time series that will be revisited throughout these topics will be introduced next.

Example 9.3 The temperature in degrees Celsius of a beaver (Castor canadensis) in Wisconsin was taken every ten minutes by telemetry on November 3–4, 1990. The resulting time series of [latex]n = 100[/latex] observations is contained in the built-in data frame named beaver2 in R. The data frame includes columns that contain the temperatures recorded in degrees Celsius (beaver2$temp) and an indicator variable (beaver2$activ) that reports whether or not the beaver was active outside of its lodge at the associated observation time. The data frame can be viewed by typing beaver2. More information about the data set can be viewed by typing help(beaver2). The R statement

generates the time series plot of the temperature readings given in Figure 9.5. A vertical dashed line has been added between x₃₈ and x₃₉ to signify when the beaver transitioned from an inactive state to an active state. It is clear that a stationary time series model is not appropriate for the entire time series because the population mean appears to increase significantly between the inactive and active periods. So we limit our modeling effort to just the temperatures that were recorded while the beaver was in the active state. The [latex]n = 62[/latex] beaver temperatures during the active period, ordered row-wise, are given in Table 9.1.

Table 9.1: Beaver temperatures at ten-minute intervals in the active state.
37.98	38.02	38.00	38.24	38.10	38.24	38.11	38.02	38.11
38.01	37.91	37.96	38.03	38.17	38.19	38.18	38.15	38.04
37.96	37.84	37.83	37.84	37.74	37.76	37.76	37.64	37.63
38.06	38.19	38.35	38.25	37.86	37.95	37.95	37.76	37.60
37.89	37.86	37.71	37.78	37.82	37.76	37.81	37.84	38.01
38.10	38.15	37.92	37.64	37.70	37.46	37.41	37.46	37.56
37.55	37.75	37.76	37.73	37.77	38.01	38.04	38.07

The question posed in this example is whether an AR(1) model is appropriate time series model for the 62 temperatures taken during the active period.

Figure 9.5: Wisconsin beaver temperatures recorded at 10-minute intervals.

Long Description for Figure 9.5

The horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from 36.5 to 38.5 degrees Celsius. A dashed vertical line is drawn at t equals 38.5. The left side of the vertical line represents inactive and the right side represents active. In the inactive state, from t equals 1 to 7, the temperature increases from 36.5 to 37.25 degrees, then decreases to 36.9 degrees, and then progressively increases to 37.6 degrees. In the active period, as t increases, the temperature gradually increases to 38.2 degrees, decreases to 37.2 degrees, again increases to 38.3 degrees, and vacillates between 37.5 and 38.5 degrees. All data are estimated.

The time series of temperatures of the beaver in the active state, the sample autocorrelation function, and the sample partial autocorrelation function can be graphed with the R statements

The trio of graphs is displayed in Figure 9.6. A horizontal line has been added to the time series plot at [latex]\bar x = 37.9[/latex]. A visual assessment of the [latex]n = 62[/latex] observations from the time series indicates that the mean value does not appear to be systematically increasing or decreasing over the time period. From the documentation of the time series, one can see that the active period for the beaver began at 3:50 PM on November 3, 1990 and ended at 2:00 AM on November 4, 1990. The ambient temperature might have an effect on the beaver’s temperature, but this will not be pursued further. For now, we will assume that there is no systematic, secular trend in the mean value. The variance of the observations in the time series also seems to be stable over time. Based on this cursory analysis, it appears plausible that the beaver’s temperature during the active period could have been drawn from a stationarity time series model. Now we turn to the interpretation of the sample autocorrelation function and the sample partial autocorrelation function. The sample autocorrelation function has three initial positive statistically significant spikes which decrease in magnitude. The correlogram tails off. This is consistent with an AR(1) model. We discount the barely significant spikes at lags [latex]k = 14[/latex] and [latex]k = 15[/latex], although we could pursue other studies concerning beaver temperatures over time to see if there might be a cyclic variation component present. Furthermore, the partial autocorrelation function has a positive statistically significant spike at lag 1, then cuts off after lag 1. This is also consistent with an AR(1) model. The fact that [latex]r_1 = r^*_1 = 0.79[/latex] is positive is consistent with the time series plot of the beaver’s temperature, which tends to linger above and below the sample mean value [latex]\bar{x} = 37.9[/latex] for significant periods of time. So far, the evidence points to the AR(1) time series model being a reasonable model for the beaver’s temperature during its active period.

Figure 9.6: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 62[/latex] temperatures of an active beaver.

Long Description for Figure 9.6

In the time series plot, the horizontal axis t ranges from 1 to 62. The vertical axis x subscript t ranges from 37.4 to 38.4 in increments of 0.2 degrees Celsius. A horizontal line is drawn at 37.9 degrees Celsius. The graph behaves in a spiked pattern. The first 19 values lie above the horizontal line, ranging between 37.9 and 38.2 degrees Celsius. The next eight values decrease progressively from 37.8 to 37.6 degrees Celsius, and increase to a peak o 38.4 degrees Celsius at t equals 30. It then decreases below the horizontal line, and the next 10 values range between 37.6 and 37.8 degrees Celsius, followed by an increases to 38.1. It then decreases to a minimum temperature o 37.4 degrees at t equals 52, and increases to 38.05 at t equals 62. All data are estimated. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The value of r subscript k from k values 0 to 5 are positive and the values decrease from 1.0 to 0.0. For k values 6 to 10, the r subscript k values vacillate around negative 0.1. Again for k values 11 to 15, the r subscript k values are positive ranging between 0.0 and 0.25. In the second correlogram, the horizontal axis k ranges from 0 to 15 and the vertical axis r star subscript k ranges from negative 1.0 to 1.0. A horizontal line is drawn at 0.0. The r star subscript k values are 1.0 and 0.8 for k values 0 and 1, respectively. The remaining r star subscript k values range between negative 0.2 and 0.2. All data are estimated.

The AR(1) model gives us a secondary manner to visually assess whether or not it is an appropriate time series model for the beaver temperatures. Since the shifted AR(1) model is

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t}, \end{array}$

the aforementioned interpretation of this time series model as a simple linear regression model means that a plot of [latex]x_{t - 1} - \bar x[/latex] versus [latex]x_t - \bar x[/latex] (or [latex]x_{t - 1}[/latex] versus x_t) should be approximately linear. The plot displayed in Figure 9.7 contains the [latex]n - 1 = 62 - 1 = 61[/latex] pairs [latex](x_{t-1}, \, x_t)[/latex] of adjacent points on a set of axes, which is generated by the R statements

Figure 9.7: Scatterplot of adjacent pairs of temperatures of an active beaver.

Long Description for Figure 9.7

The horizontal axis x subscript t minus 1 ranges from 37.4 to 38.4 in increments of 0.2 units. The vertical axis x subscript t ranges from 37.4 to 38.4 in increments of 0.2 units. The increasing line is drawn from 37.48 on the vertical axis to the point (38.4, 38.25). Sixty one data points are scattered around the regression line and almost follow a linear trend. Some lie a little close to the line and some lie a little away from the line. All values are estimated.

The paucity of points in the upper-left and lower-right portions of the scatterplot and the approximately linear relationship between the adjacent observations lends further evidence that the AR(1) model might be an appropriate time series model. The lm function fits a linear model to the data pairs and the abline function plots the associated regression line. The additional R statement

indicates that the slope of the regression line differs statistically from 0 with p-value [latex]2 \cdot 10 ^ {-14}[/latex], indicating a strong linear relationship between adjacent observations.

In conclusion, a preliminary graphical analysis of the [latex]n = 62[/latex] temperatures of the beaver in the active state indicates that an AR(1) time series model should be on the short list of potential time series models. The next step is to estimate the parameters in the model.

Parameter Estimation

There are two parameters, [latex]\phi[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex], to estimate in the standard AR(1) model

$\begin{array}{l} X_{t} = ϕ X_{t - 1} + Z_{t} . \end{array}$

There are three parameters, μ, [latex]\phi[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex], to estimate in the shifted AR(1) model

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t} . \end{array}$

The three parameter estimation techniques outlined in Section 8.2.1 are applied to the shifted AR(1) time series model next.

Approach 1: Method of moments. In the case of the shifted AR(1) model, we match the population and sample (a) first-order moments, (b) second-order moments, and (c) lag 1 autocorrelation. Placing the population moments on the left-hand side of the equation and the associated sample moments on the right-hand side of the equation results in three equations in three unknowns:

$\begin{array}{l} E [X_{t}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t} \\ E [X_{t}^{2}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ ρ (1) & = r_{1} \end{array}$

or

$\begin{array}{r} μ & = & \bar{X} \\ V [X_{t}] + E {[X_{t}]}^{2} = γ (0) + μ^{2} = (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2} + μ^{2} & = & \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ ϕ & = & r_{1} . \end{array}$

These equations can be solved in closed form for the three unknown parameters μ, [latex]\phi[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] yielding the method of moments estimators

$\begin{array}{l} \hat{μ} = \bar{X}, \hat{ϕ} = r_{1}, {\hat{σ}}_{Z}^{2} = (1 - {\hat{ϕ}}^{2}) (\frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} - {\hat{μ}}^{2}) = (1 - r_{1}^{2}) (\frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} - {\bar{X}}^{2}) . \end{array}$

This constitutes a proof of the following result.

These point estimators are random variables and have been written as a function of the random time series values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]. For observed time series values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex], the lowercase versions of the formulas will be used.

Example 9.4 For the time series of [latex]n = 62[/latex] temperature observations of the beaver in the active state, find the method of moments estimators of μ, [latex]\phi[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] for the AR(1) model.

The R code below calculates and prints the point estimates of the μ, [latex]\phi[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] parameters.

The point estimates for the unknown parameters computed by this code are

$\begin{array}{l} \hat{μ} = 37.90 \hat{ϕ} = 0.7894 {\hat{σ}}_{Z}^{2} = 0.01734 . \end{array}$

These point estimates are reported to four digits because the data values were given to four-digit accuracy. The positive value for [latex]\hat \phi[/latex] is consistent with the fact that the beaver’s temperature lingers above and below the sample mean in the time series plot in Figure 9.6. The estimated standard deviation of the white noise error terms,

$\begin{array}{l} {\hat{σ}}_{Z} = \sqrt{0.01734} ≅ 0.1317, \end{array}$

reflects the dispersion of the observations in Figure 9.7 about the regression line.

Approach 2: Least squares. Consider the shifted stationary AR(1) model

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t} . \end{array}$

For least squares estimation, we first establish the sum of squares S as a function of the parameters μ and [latex]\phi[/latex] and use calculus to find the least squares estimators of μ and [latex]\phi[/latex]. This will result in a slight difference between the usual pattern of using the sample mean [latex]\bar x[/latex] to estimate the population mean μ. Once these least squares estimators have been determined, the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex] will be estimated.

The sum of squared errors is

$\begin{array}{l} S = \sum_{t = 2}^{n} Z_{t}^{2} = \sum_{t = 2}^{n} {[X_{t} - μ - ϕ (X_{t - 1} - μ)]}^{2} . \end{array}$

The partial derivatives of S with respect to μ and [latex]\phi[/latex] are

$\begin{array}{l} \frac{\partial S}{\partial μ} = \sum_{t = 2}^{n} 2 [X_{t} - μ - ϕ (X_{t - 1} - μ)] (- 1 + ϕ) \end{array}$

and

$\begin{array}{l} \frac{\partial S}{\partial ϕ} = \sum_{t = 2}^{n} - 2 [X_{t} - μ - ϕ (X_{t - 1} - μ)] (X_{t - 1} - μ) . \end{array}$

Equating the first of the partial derivatives to zero yields

$\begin{array}{l} \sum_{t = 2}^{n} [X_{t} - μ - ϕ (X_{t - 1} - μ)] = 0 \end{array}$

or

$\begin{array}{l} \sum_{t = 2}^{n} X_{t} - ϕ \sum_{t = 2}^{n} X_{t - 1} - (n - 1) μ (1 - ϕ) = 0 \end{array}$

or

$\begin{array}{l} {\bar{X}}_{2} - ϕ {\bar{X}}_{1} - μ (1 - ϕ) = 0 \end{array}$

or

$\begin{array}{l} \hat{μ} = \frac{{\bar{X}}_{2} - \hat{ϕ} {\bar{X}}_{1}}{1 - \hat{ϕ}}, \end{array}$

where

$\begin{array}{l} {\bar{X}}_{1} = \frac{1}{n - 1} \sum_{t = 1}^{n - 1} X_{t} and {\bar{X}}_{2} = \frac{1}{n - 1} \sum_{t = 2}^{n} X_{t} . \end{array}$

Equating the second of the partial derivatives to zero yields

$\begin{array}{l} \sum_{t = 2}^{n} (X_{t} - μ) (X_{t - 1} - μ) - ϕ \sum_{t = 2}^{n} {(X_{t - 1} - μ)}^{2} = 0 \end{array}$

or

$\begin{array}{l} \hat{ϕ} = \frac{\sum_{t = 2}^{n} (X_{t} - \hat{μ}) (X_{t - 1} - \hat{μ})}{\sum_{t = 2}^{n} {(X_{t - 1} - \hat{μ})}^{2}} . \end{array}$

So the ordinary least squares estimators for μ and [latex]\phi[/latex] can be determined by numerically solving the simultaneous equations

$\begin{array}{l} \hat{μ} = \frac{{\bar{X}}_{2} - \hat{ϕ} {\bar{X}}_{1}}{1 - \hat{ϕ}} and \hat{ϕ} = \frac{\sum_{t = 2}^{n} (X_{t} - \hat{μ}) (X_{t - 1} - \hat{μ})}{\sum_{t = 2}^{n} {(X_{t - 1} - \hat{μ})}^{2}} \end{array}$

for [latex]\hat \mu[/latex] and [latex]\hat \phi[/latex].

The last parameter to estimate is [latex]\sigma _ Z ^ {\, 2}[/latex]. Since

$\begin{array}{l} γ (0) = (\frac{1}{1 - ϕ^{2}}) σ_{Z}^{2} \end{array}$

for an AR(1) time series model, the population variance of the white noise can be expressed as

$\begin{array}{l} σ_{Z}^{2} = (1 - ϕ^{2}) γ (0) . \end{array}$

Replacing [latex]\phi[/latex] by the estimator r₁ because [latex]\rho(1) = \phi[/latex], and replacing [latex]\gamma(0) = V \left[ X_t \right][/latex] by the estimator [latex]c_0 = \frac{1}{n} \sum_{t\,=\,1}^n \left( X_t - \bar X \right) ^ 2[/latex] gives the point estimator

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = (1 - r_{1}^{2}) c_{0}, \end{array}$

which matches the method of moments estimator from Theorem 9.4. This derivation constitutes a proof of the following result.

We now apply these techniques to the beaver temperature data set from Example 9.3.

Example 9.5 Find the least squares estimators of μ, [latex]\phi[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] for the time series of [latex]n = 62[/latex] beaver temperatures from Example 9.3.

The code below contains a function s which calculates the sum of squares, and then uses the R optim function to minimize the sum of squares using the method of moments estimates as initial estimates. The optim function minimizes the objective function by default.

The point estimates for the unknown parameters computed by this code are

$\begin{array}{l} \hat{μ} = 37.91 \hat{ϕ} = 0.7972 {\hat{σ}}_{Z}^{2} = 0.01762 . \end{array}$

Figure 9.8 shows the sum of squares for fixed [latex]\hat \mu = 37.91[/latex] as a function of [latex]\phi[/latex]. The sum of squares is minimized at [latex]\hat \phi = 0.7972[/latex]. The least squares parameter estimates are very close to the method of moments parameter estimators. We now consider why the two estimators are so close to one another.

A U shaped curve is graphed on a coordinate plane. The horizontal axis phi ranges from 0.7 to 0.9 in increments of 0.1 units. The vertical axis S ranges from 1.05 to 1.08 in increments of 0.01 units. The U shaped curve decreases from (0.7, 1.076), reaches a low point at (0.8, 1.06), and increases to (0.9, 1.08). All data are estimated. — Figure 9.8: Sum of squares as a function of [latex]\phi[/latex].

Since

$\begin{array}{l} {\bar{X}}_{1} = \frac{1}{n - 1} \sum_{t = 1}^{n - 1} X_{t} and {\bar{X}}_{2} = \frac{1}{n - 1} \sum_{t = 2}^{n} X_{t} \end{array}$

contain the[latex]n - 2[/latex] common values [latex]X_2, \, X_3, \, \ldots, \, X_{n - 1}[/latex], one approximation that can be applied to the least squares estimates is to assume that [latex]\bar X _ 1 \cong \bar X _ 2 \cong \bar X[/latex] for large values of n, which allows for closed-form approximate least squares estimators:

$\begin{array}{l} \hat{μ} = \bar{X} and \hat{ϕ} = \frac{\sum_{t = 2}^{n} (X_{t} - \bar{X}) (X_{t - 1} - \bar{X})}{\sum_{t = 2}^{n} {(X_{t - 1} - \bar{X})}^{2}} . \end{array}$

As a secondary additional approximation, the denominator of [latex]\hat \phi[/latex] with the first approximation in place,

$\begin{array}{l} \sum_{t = 2}^{n} {(X_{t - 1} - \bar{X})}^{2}, \end{array}$

is approximately equal to

$\begin{array}{l} \sum_{t = 1}^{n} {(X_{t} - \bar{X})}^{2} \end{array}$

for large values of n. With this additional assumption, the least squares estimate for [latex]\phi[/latex] reduces to the approximate least squares estimate

$\begin{array}{l} \hat{ϕ} = \frac{c_{1}}{c_{0}} = r_{1}, \end{array}$

which is the method of moments estimator of [latex]\phi[/latex] because [latex]\rho(1) = \phi[/latex] for an AR(1) model. With both approximations in place, the least squares estimators exactly match the method of moments estimators. This is why the estimates from the two techniques are so close.

Approach 3: Maximum likelihood estimation. The likelihood function is the joint probability density function of the observed values in the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] in a shifted AR(1) model is

$\begin{array}{l} L (μ, ϕ, σ_{Z}^{2}) = f (x_{1}, x_{2}, \dots, x_{n}), \end{array}$

where the [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] arguments on L and the μ, [latex]\phi[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] arguments on f have been dropped for brevity. It is not possible to simply multiply the marginal probability density functions because the values in the AR(1) time series model are correlated. In order to use maximum likelihood estimation, we make the additional assumption that the white noise terms [latex]Z_1, \, Z_2, \, \ldots, \, Z_n[/latex] are in fact Gaussian white noise terms:

$\begin{array}{l} f_{Z_{t}} (z_{t}) = \frac{1}{\sqrt{2 π σ_{Z}^{2}}} e^{- z_{t}^{2} / (2 σ_{Z}^{2})} - \infty < z_{t} < \infty \end{array}$

for [latex]t = 1, \, 2, \, \ldots, \, n[/latex], which is the probability density function of a [latex]N\left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex] random variable. Ignoring Z₁ temporarily, the joint probability density function of the mutually independent white noise random variables [latex]Z_2, \, Z_3, \, \ldots, \, Z_n[/latex] is

$\begin{array}{l} f_{Z_{2}, Z_{3}, \dots, Z_{n}} (z_{2}, z_{3}, \dots, z_{n}) = {(2 π σ_{Z}^{2})}^{- (n - 1) / 2} e^{- \sum_{t = 2}^{n} z_{t}^{2} / (2 σ_{Z}^{2})} \end{array}$

for [latex]\left( z_2, \, z_3, \, \ldots , \, z_n \right) \in {\cal R} ^ {n - 1}[/latex]. The shifted AR(1) model

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t} \end{array}$

applies for all values of t, so

$\begin{array}{l} X_{2} - μ & = ϕ (X_{1} - μ) + Z_{2} \\ X_{3} - μ & = ϕ (X_{2} - μ) + Z_{3} \\ ⋮ \\ X_{n} - μ & = ϕ (X_{n - 1} - μ) + Z_{n} . \end{array}$

Solving these equations for [latex]X_2, \, X_3, \, \ldots, \, X_n[/latex], consider the transformation of the [latex]Z_2, \, Z_3, \, \ldots , \, Z_n[/latex] values

$\begin{array}{l} X_{2} & = μ + ϕ (X_{1} - μ) + Z_{2} \\ X_{3} & = μ + ϕ (X_{2} - μ) + Z_{3} \\ ⋮ \\ X_{n} & = μ + ϕ (X_{n - 1} - μ) + Z_{n} \end{array}$

conditioned on [latex]X_1 = x_1[/latex], which is a one-to-one transformation from [latex]{\cal R}^{n - 1}[/latex] to [latex]{\cal R}^{n - 1}[/latex] with inverse transformation

$\begin{array}{l} Z_{2} & = X_{2} - μ - ϕ (X_{1} - μ) \\ Z_{3} & = X_{3} - μ - ϕ (X_{2} - μ) \\ ⋮ \\ Z_{n} & = X_{n} - μ - ϕ (X_{n - 1} - μ) \end{array}$

and Jacobian

$\begin{array}{l} J = | \begin{array}{cccccc} 1 & - ϕ & 0 & \dots & 0 & 0 \\ 0 & 1 & - ϕ & \dots & 0 & 0 \\ 0 & 0 & 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 & - ϕ \\ 0 & 0 & 0 & \dots & 0 & 1 \end{array} | = 1. \end{array}$

By the transformation technique, the joint probability density function of [latex]X_2, \, X_3, \, \ldots, \, X_n[/latex] conditioned on [latex]X_1 = x_1[/latex] is

$\begin{array}{l} f_{X_{2}, X_{3}, \dots, X_{n} | X_{1}} (x_{2}, x_{3}, \dots, x_{n} | X_{1} = x_{1}) = {(2 π σ_{Z}^{2})}^{- (n - 1) / 2} e^{- \sum_{t = 2}^{n} {[x_{t} - μ - ϕ (x_{t - 1} - μ)]}^{2} / (2 σ_{Z}^{2})} \end{array}$

for [latex]\left( x_2, \, x_3, \, \ldots , \, x_n \right) \in {\cal R} ^ {n - 1}[/latex] and [latex]x_1 \in {\cal R}[/latex]. The final step in the derivation of the likelihood function involves determining the marginal distribution of X₁. Since

$\begin{array}{l} X_{1} \sim N (μ, \frac{σ_{Z}^{2}}{1 - ϕ^{2}}), \end{array}$

the probability density function of X₁ is

$\begin{array}{l} f_{X_{1}} (x_{1}) = \sqrt{\frac{1 - ϕ^{2}}{2 π σ_{Z}^{2}}} e^{- (1 - ϕ^{2}) (x_{1} - μ)^{2} / (2 σ_{Z}^{2})} - \infty < x_{1} < \infty . \end{array}$

The joint probability density function of [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex] is the product of the conditional probability density function and the marginal probability density function:

$\begin{array}{l} f_{X_{1}, X_{2}, \dots, X_{n}} (x_{1}, x_{2}, \dots, x_{n}) = f_{X_{2}, X_{3}, \dots, X_{n} | X_{1}} (x_{2}, x_{3}, \dots, x_{n} | X_{1} = x_{1}) f_{X_{1}} (x_{1}) \end{array}$

for [latex]\left( x_1, \, x_2, \, \ldots, \, x_n \right) \in {\cal R} ^ n[/latex]. So the likelihood function is

$\begin{array}{l} L (μ, ϕ, σ_{Z}^{2}) = {(2 π σ_{Z}^{2})}^{- n / 2} \sqrt{1 - ϕ^{2}} e^{- S (μ, ϕ) / (2 σ_{Z}^{2})}, \end{array}$

where the unconditional sum of squares is

$\begin{array}{l} S (μ, ϕ) = (1 - ϕ^{2}) {(x_{1} - μ)}^{2} + \sum_{t = 2}^{n} {[(x_{t} - μ) - ϕ (x_{t - 1} - μ)]}^{2} . \end{array}$

The associated log likelihood function is

$\begin{array}{l} \ln L (μ, ϕ, σ_{Z}^{2}) = - \frac{n}{2} \ln (2 π σ_{Z}^{2}) + \frac{1}{2} \ln (1 - ϕ^{2}) - \frac{S (μ, ϕ)}{2 σ_{Z}^{2}} . \end{array}$

The maximum likelihood estimators [latex]\hat \mu[/latex], [latex]\hat \phi[/latex], and [latex]\hat \sigma _ Z ^ {\, 2}[/latex] satisfy

$\begin{array}{l} \frac{\partial \ln L (μ, ϕ, σ_{Z}^{2})}{\partial μ} & = \frac{(1 - ϕ^{2}) (x_{1} - μ) + (1 - ϕ) \sum_{t = 2}^{n} [(x_{t} - μ) - ϕ (x_{t - 1} - μ)]}{σ_{Z}^{2}} = 0 \\ \frac{\partial \ln L (μ, ϕ, σ_{Z}^{2})}{\partial ϕ} & = - \frac{ϕ}{1 - ϕ^{2}} + \frac{ϕ (x_{1} - μ)^{2} + \sum_{t = 2}^{n} [(x_{t} - μ) - ϕ (x_{t - 1} - μ)] (x_{t - 1} - μ)}{σ_{Z}^{2}} = 0 \\ \frac{\partial \ln L (μ, ϕ, σ_{Z}^{2})}{\partial σ_{Z}^{2}} & = - \frac{n}{2 σ_{Z}^{2}} + \frac{S (μ, ϕ)}{2 σ_{Z}^{4}} = 0. \end{array}$

Although the third equation satisfies

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = \frac{S (\hat{μ}, \hat{ϕ})}{n}, \end{array}$

numerical methods are required to solve the equations.

Maximum likelihood estimation will be illustrated in the next example.

Table 9.2 summarizes the point estimators that have been calculated in the previous three examples for the [latex]n = 62[/latex] beaver temperatures. The point estimators associated with the three methods are quite close for this particular time series.

Table 9.2:Point estimators for the AR(1) parameters for the [latex]n = 62[/latex] beaver temperatures.
Method	[latex]\hat \mu[/latex]	[latex]\hat \phi[/latex]	[latex]\hat \sigma _Z ^ {\, 2}[/latex]
Method of moments	37.90	0.7894	0.01734
Ordinary least squares	37.91	0.7972	0.01762
Maximum likelihood estimation	37.91	0.7850	0.01697

The R function ar fits autoregressive models. The parameter estimates from the three previous examples could have been calculated with the following four R statements.

Table 9.3 contains the point estimates returned by the ar function. The tiny differences between some of the entries in Tables 9.2 and 9.3 might be due to slightly different approximations and/or roundoff in the optimization routines.

Table 9.3: Point estimators for the [latex]n = 62[/latex] beaver temperatures via the ar function.
Method	[latex]\hat \mu[/latex]	[latex]\hat \phi[/latex]	[latex]\hat \sigma _Z ^ {\, 2}[/latex]
Method of moments (Yule–Walker)	37.90	0.7894	0.01792
Ordinary least squares	37.90	0.7972	0.01724
Maximum likelihood estimation	37.90	0.7865	0.01699

We have now derived and illustrated the three point estimation techniques, the method of moments, least squares, and maximum likelihood estimation, for the parameters in an AR(1) model from a realization of a time series consisting of n observations. Which of these techniques provides the best point estimators? This is not an easy question to answer because there are a large number of factors, such as the sample size n, the values of the parameters in the model, and the fact that there are three parameters to estimate. There will not necessarily be one universal answer to the question. We do a focused evaluation on the point estimator for [latex]\phi[/latex] because it typically differs for the three methods of point estimation. The mean square error associated with the point estimate for [latex]\phi[/latex] is

$\begin{array}{l} E [{(\hat{ϕ} - ϕ)}^{2}] . \end{array}$

The following R code conducts a Monte Carlo simulation experiment which estimates the mean square error of the three point estimators for [latex]\phi[/latex] for 40,000 replications. We selected the time series model with [latex]\mu = 38[/latex], [latex]\phi = 0.8[/latex], [latex]\sigma _ Z = 0.13[/latex], and [latex]n = 62[/latex], which are parameters that are near the estimated parameters in the last three examples involving the time series of beaver temperatures.

After a call to set.seed(4) to establish the random number stream, three runs of this simulation yielded the following estimated mean squared error values:

$\begin{array}{l} Method of moments : 0.0135 Least squares : 0.0117 Maximum likelihood : 0.0113 . \end{array}$

Furthermore, confidence intervals for the three methods do not overlap. Since small values of the mean square error are preferred, we conclude that the maximum likelihood estimator is the preferred estimator for these parameter settings, followed by the least squares estimator, followed by the method of moments estimator in a distant third place.

The focus on estimation thus far has been on point estimation techniques. We also want to report some indication of the precision associated with these point estimators. In the previous example, the sampling distributions of [latex]\hat \mu[/latex], [latex]\hat \phi[/latex], and [latex]\hat \sigma _ Z ^ {\, 2}[/latex] in the AR(1) model are too complicated to derive analytically. As an illustration of a confidence interval for one of the parameters, we use the asymptotic normality of the maximum likelihood estimator of [latex]\phi[/latex] in the result:

$\begin{array}{l} \hat{ϕ} \overset{D}{\to} N (ϕ, \frac{1 - ϕ^{2}}{n}) . \end{array}$

This result leads to an asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for [latex]\phi[/latex].

This asymptotically exact confidence interval will now be illustrated with the time series of active beaver temperatures from the three previous examples.

Model Assessment

Now that techniques for point and interval estimates for the parameters in the AR(1) model have been established, we are interested in assessing the adequacy of the AR(1) time series model. This will involve an analysis of the residuals. Recall from Section 8.2.3 that the residuals are defined by

$\begin{array}{l} [residual] = [observed value] - [predicted value] \end{array}$

or

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t} . \end{array}$

Since [latex]\hat{X} _ {t}[/latex] is the one-step-ahead forecast from the time origin [latex]t - 1[/latex], this is more clearly written as

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t - 1} (1) . \end{array}$

Therefore, for the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] and the fitted AR(1) model with parameter estimates [latex]\hat \mu[/latex] and [latex]\hat \phi[/latex], the residual at time t is

$\begin{array}{l} {\hat{Z}}_{t} = x_{t} - [\hat{μ} + \hat{ϕ} (x_{t - 1} - \hat{μ})] \end{array}$

for [latex]t = 2, \, 3, \, \ldots, \, n[/latex] via Example 8.12. The next example shows the steps associated with assessing the adequacy of the AR(1) model for the active beaver temperature time series.

Example 9.8 Fit the AR(1) model to the active beaver temperatures from Example 9.3 using the sample mean to estimate μ and the maximum likelihood estimators for [latex]\phi[/latex] and σ². Assess the fitted AR(1) model by the following five methods.

Calculate and plot the residuals, their sample autocorrelation function, and their sample partial autocorrelation function.
Conduct a test of independence on the residuals using the number of sample autocorrelation function values for the first [latex]m = 40[/latex] lags which fall outside of [latex]\pm 1.96 / \sqrt{n}[/latex].
Conduct the Box–Pierce and Ljung–Box tests for independence of the residuals.
Conduct the turning point test for independence of the residuals.
Plot a histogram and a QQ plot of the standardized residuals in order to assess the normality of the residuals.

The following R commands calculate the [latex]n - 1 = 61[/latex] residuals and plot them as a time series, along with the associated sample autocorrelation function and sample partial autocorrelation function.

The results are displayed in Figure 9.9. The residuals do not appear to have any cyclic variation, trend, or serial correlation.

Figure 9.9: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n - 1 = 61[/latex] residuals from AR(1) fitted model.

Long Description for Figure 9.9

In the time series plot, the horizontal axis t ranges from 1 to 61. The vertical axis Z cap subscript t ranges from negative 0.4 to 0.4 in increments of 0.1 units. A horizontal line is drawn at 0.0. The first 26 Z cap t values range from negative 0.15 to 0.25. The value at t equals 26 reaches a maximum of 0.35, and reaches a minimum of negative 0.3 at t equals 30. The remaining values range between negative 0.25 and 0.25, with some having values 0. In the first correlogram, the horizontal axis k ranges from 0 to 40 and the vertical axis r subscript k ranges from negative 1.0 to 1.0. The value of r subscript k is 1.0 for k value 0. The remaining r subscript k values alternate between positive and negative, following a damped sinusoidal fashion. The values range between negative 0.25 and 0.25. In the second correlogram, the horizontal axis k ranges from 0 to 40, and the vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The r star subscript k value is 1.0 for k equals 0. The remaining values vacillate in a damped sinusoidal pattern, ranging between negative 0.25 and 0.2. All data are estimated.
There is just one sample autocorrelation function value that falls outside of the limits [latex]\pm 1.96 / \sqrt{n}[/latex] (at lag 15) in the plot in Figure 9.9 of the first 40 sample autocorrelation function values associated with the residuals. Since we expect [latex]40 \cdot 0.05 = 2[/latex] values to fall outside of these limits in the case of a good fit, we fail to reject H₀ in this case. The adequacy of the fit of the AR(1) model is not rejected by this test.
The additional R code below calculates the Box–Pierce and Ljung–Box test statistics and the associated p-values.

The Box–Pierce test statistic is 27.9 and the associated p-value is [latex]p = 0.89[/latex]. The Ljung–Box test statistic is 41.8 and the associated p-value is [latex]p = 0.31[/latex]. We fail to reject H₀ in both tests based on the chi-square critical value with [latex]40 - 2 = 38[/latex] degrees of freedom. Some keystrokes can be saved by using the built-in Box.test function in R as shown below.

The Box.test function delivers identical test statistics and p-values. The adequacy of the fit of the AR(1) model is not rejected by these tests.
The following additional R statements calculate the test statistic and the p-value for the turning point test applied to the time series consisting of the [latex]n - 1 = 61[/latex] residual values for the AR(1) fit to the beaver temperatures in the active state.

The tail probability is doubled because the alternative hypothesis is two-tailed for the turning point test. The test statistic s is [latex]-0.83[/latex] and the p-value is [latex]p = 0.41[/latex]. We again fail to reject the null hypothesis in this case. The adequacy of the fit of the AR(1) model is not rejected by this test.
The residuals are standardized by dividing by their sample standard deviation. The following additional R statements plot a histogram of the standardized residuals using the hist function and a QQ plot to assess normality using the qqnorm function.

The plots are shown in Figure 9.10. The histogram shows that all standardized residuals fall between −3 and 3 and exhibit a bell-shaped probability distribution. The horizontal axis on the histogram is the standardized residual and the vertical axis is the frequency. The QQ plot is approximately linear, indicating a reasonable approximation to normality based on the [latex]n - 1 = 61[/latex] residuals plotted. The horizontal axis on the QQ plot is the standardized theoretical quantile and the vertical axis is the associated normal data quantile. Although a formal statistical goodness-of-fit test should be conducted, it appears that the assumption of Gaussian white noise is appropriate for the AR(1) time series model based on these two plots.

A histogram and a Q Q plot of the standardized residuals. — Figure 9.10: Histogram (left) and QQ plot (right) of the fitted AR(1) standardized residuals.

Long Description for Figure 9.10

In a histogram, the horizontal axis ranges from negative 3 to 3 in increments of 1 unit. The vertical axis ranges from 0 to 25 in increments of 5 units. From negative 3 to 3, there are six bars that are distributed normally. The frequency of the bars from negative 3 to 3 is 3, 6, 21, 22, 8, and 1, respectively. In the Q Q plot, the horizontal and the vertical axes range from negative 3 to 3 in increments of 1 unit. Sixty one data points are plotted in a roughly linear trend. A cluster of 55 data points is formed between negative 2 and 2 on the horizontal axis and between negative 1.5 to 1.5 on the vertical axis. All data are estimated.

We have seen a number of indicators that the AR(1) time series model is an adequate model for the active beaver temperatures. But how do we know that there is not a better model with more terms lurking below the surface that might provide a better fit? The next subsection considers the process of model selection.

Model Selection

One way of eliminating the possibility of a better time series model is to overfit the tentative AR(1) time series model with ARMA(p, q) models of higher order. We have not yet surveyed the techniques for estimating the parameters in these models with additional terms, so for now we will let the arima function in R estimate their parameters and compare them via their AIC (Akaike’s Information Criterion) statistics. The AIC statistic was introduced in Section 8.2.4.

Example 9.9 For the [latex]n = 62[/latex] temperatures of an active beaver given in Example 9.3, find the ARMA(p, q) model that minimizes the AIC.

The R code below creates a [latex]4 \times 4[/latex] matrix a that will be populated with the AIC statistics for the ARMA(p, q) time series models, for [latex]p = 0, \, 1, \, 2, \, 3[/latex] and [latex]q = 0, \, 1, \, 2, \, 3[/latex] using nested for loops. The arima function is used to fit the models via maximum likelihood estimation, whose AIC values are placed in the matrix a.

The results of this code are given in Table 9.4. The smallest AIC value (barely!) is set in boldface type and corresponds to the AR(1) model. This provides further evidence that the AR(1) model is adequate, and a more complex model is probably not warranted. Although the AR(2) and ARMA(1, 1) models have nearly identical AIC values, the additional parameter did not overcome the penalty inflicted by AIC for its inclusion in the time series model.

Table 9.4: AIC statistics for ARMA(*p, q*) models for the [latex]n = 62[/latex] beaver temperatures.
	q = 0	q = 1	q = 2	q = 3
p = 0	−10.9	−51.1	−60.9	−64.4
p = 1	−69.7	−69.6	−67.6	−67.0
p = 2	−69.6	−67.6	−65.6	−66.4
p = 3	−67.7	−65.7	−65.1	−64.7

The $ extractor with the aic argument was used to extract the AIC statistics from the results of the call to arima. If the coef and sigma2 components are extracted from the list returned by the call to arima, our final model is the AR(1) model with maximum likelihood estimates for the parameters given by

$\begin{array}{l} \hat{μ} = 37.9, \hat{ϕ} = 0.787, {\hat{σ}}_{Z}^{2} = 0.017 . \end{array}$

The final model is therefore

$\begin{array}{l} X_{t} - 37.9 = 0.787 (X_{t - 1} - 37.9) + Z_{t}, \end{array}$

where Z_t is a time series of Gaussian white noise values with [latex]\sigma _ Z ^ {\, 2} = 0.017[/latex], as established by the histogram and QQ plot in Example 9.8.

In some applications, just describing the time series model for the beaver temperatures in the active state with the fitted AR(1) model is adequate. In other applications, simulating the values in the fitted AR(1) model is the goal. But in many application areas, particularly economics, there is often an interest in forecasting future values of a time series from a realization. In our setting, we might be interested in this particular beaver’s future temperature based on the [latex]n = 62[/latex] temperature values collected. The next subsection considers forecasting for the AR(1) model.

Forecasting

We now pivot to the development of a procedure to forecast future values of a time series that is governed by an AR(1) model. To review the notation for forecasting, the observed time series values are [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. The forecast is being made at time [latex]t = n[/latex]. The random future value of the time series that is h time units in the future is denoted by [latex]X_{n + h}[/latex]. The associated forecasted value is denoted by [latex]\hat{X}_{n + h}[/latex], and is the conditional expected value

$\begin{array}{l} {\hat{X}}_{n + h} = E [X_{n + h} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] . \end{array}$

We would like to calculate this forecasted value and an associated prediction interval for the AR(1) model. As in Section 8.2.2, we assume that all parameters are known in the derivations that follow.

Recall from Example 8.12 that the forecasted value for one time unit into the future for a shifted AR(1) model is

$\begin{array}{l} {\hat{X}}_{n + 1} = μ + ϕ (x_{n} - μ) . \end{array}$

We would like to generalize this so as to find the forecasted value h time units into the future. In other words, we want to find [latex]\hat{X} _ {n + h}[/latex]. The shifted AR(1) model is

$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t} . \end{array}$

Replacing t by [latex]n + h[/latex], which is the time value of interest, gives

$\begin{array}{l} X_{n + h} - μ = ϕ (X_{n + h - 1} - μ) + Z_{n + h} . \end{array}$

Taking the conditional expected value of each side of this equation results in

$\begin{array}{l} {\hat{X}}_{n + h} = μ + ϕ ({\hat{X}}_{n + h - 1} - μ) . \end{array}$

Iterating on this equation for time values that are sequentially one time unit closer to the present time [latex]t = n[/latex] yields

$\begin{array}{l} {\hat{X}}_{n + h} & = μ + ϕ ({\hat{X}}_{n + h - 1} - μ) \\ = μ + ϕ [μ + ϕ ({\hat{X}}_{n + h - 2} - μ) - μ] \\ = μ + ϕ^{2} ({\hat{X}}_{n + h - 2} - μ) \\ ⋮ \\ = μ + ϕ^{h - 1} ({\hat{X}}_{n + 1} - μ) \\ = μ + ϕ^{h - 1} [μ + ϕ (x_{n} - μ) - μ] \\ = μ + ϕ^{h} (x_{n} - μ) . \end{array}$

Notice that the forecasted value is a function of x_n, but not a function of [latex]x_1, \, x_2, \, \ldots, \, x_{n - 1}[/latex]. This is a sensible forecast in the sense that for a long time horizon h into the future and a stationary shifted AR(1) model with [latex]-1 < \phi < 1[/latex],

$\begin{array}{l} lim_{h \to \infty} {\hat{X}}_{n + h} = μ . \end{array}$

If you were asked to forecast your temperature one year from now, you would probably say 98.6° Fahrenheit (or whatever your average temperature might be), regardless of whether you are healthy or have a fever right now. Long-term forecasts for stationary time series models always tend to the population mean.

As is typically the case in statistics, we would like to pair our point estimator [latex]\hat{X}_{n + h}[/latex] with an interval estimator, which is a prediction interval in this setting. The prediction interval gives us an indication of the precision of the forecast. In order to derive an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X _ {n + h}[/latex], it is helpful to write the shifted AR(1) model as a shifted MA(∞) model. Using successive substitutions, each one time unit prior to the previous substitution,

$\begin{array}{l} X_{t} - μ & = ϕ (X_{t - 1} - μ) + Z_{t} \\ = ϕ [ϕ (X_{t - 2} - μ) + Z_{t - 1}] + Z_{t} \\ = ϕ^{2} (X_{t - 2} - μ) + Z_{t} + ϕ Z_{t - 1} \\ = ϕ^{2} [(ϕ X_{t - 3} - μ) + Z_{t - 2}] + Z_{t} + ϕ Z_{t - 1} \\ = ϕ^{3} (X_{t - 3} - μ) + Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} \\ ⋮ \end{array}$

For [latex]-1 < \phi < 1[/latex] corresponding to a stationary shifted AR(1) model, the limiting expression for X_t is

$\begin{array}{l} X_{t} = μ + Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + \dots, \end{array}$

which is a shifted MA(∞) model. Replacing t with [latex]n + h[/latex] results in

$\begin{array}{l} X_{n + h} = μ + Z_{n + h} + ϕ Z_{n + h - 1} + ϕ^{2} Z_{n + h - 2} + \dots, \end{array}$

Taking the conditional variance of both sides of this equation yields

$\begin{array}{l} V [X_{n + h} | & X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] \\ = V [μ + Z_{n + h} + ϕ Z_{n + h - 1} + ϕ^{2} Z_{n + h - 2} + \dots | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] \\ = σ_{Z}^{2} + ϕ^{2} σ_{Z}^{2} + ϕ^{4} σ_{Z}^{2} + \dots + ϕ^{2 h - 2} σ_{Z}^{2} \\ = (1 + ϕ^{2} + ϕ^{4} + \dots + ϕ^{2 h - 2}) σ_{Z}^{2} \\ = \frac{1 - ϕ^{2 h}}{1 - ϕ^{2}} σ_{Z}^{2} \end{array}$

because the error terms at time n and prior are observed and can therefore be treated as constants. Assuming Gaussian white noise terms, an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X _ {n + h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} - z_{α / 2} \sqrt{\frac{1 - ϕ^{2 h}}{1 - ϕ^{2}}} σ_{Z} < X_{n + h} < {\hat{X}}_{n + h} + z_{α / 2} \sqrt{\frac{1 - ϕ^{2 h}}{1 - ϕ^{2}}} σ_{Z} . \end{array}$

In most practical problems, the parameters in this prediction interval will be estimated from data, which results in the following approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval.

Example 9.10 For the beaver temperature time series values [latex]x_1, \, x_2, \, \ldots, \, x_{62}[/latex] from Example 9.3, forecast the next six values in the time series and give approximate 95% prediction intervals for the forecasted values, assuming that the time series arises from a shifted AR(1) model fitted by maximum likelihood estimation.

The R code below calculates the forecasted values and associated approximate 95% prediction interval limits. The forecasts are stored in the vector pred, the lower and upper prediction limits are stored in the vectors lo and hi, respectively.

Some keystrokes can be saved by using the R built-in generic predict function to compute the forecasts and the associated standard errors.

The two code segments produce identical results, which are summarized in Table 9.5. Notice that the forecasts trend monotonically toward [latex]\bar x = 37.90[/latex] and the standard errors increase as the time horizon h increases. The increasing standard error is consistent with having less precision in the forecast as the time horizon h increases.

Table 9.5: Forecasts and 95% prediction intervals for the beaver temperatures.
Time	t = 63	t = 64	t = 65	t = 66	t = 67	t = 68
Forecast	38.04	38.01	37.99	37.97	37.96	37.95
Standard error	0.130	0.166	0.184	0.195	0.201	0.205
Lower prediction bound	37.78	37.69	37.63	37.59	37.57	37.55
Upper prediction bound	38.29	38.34	38.35	38.36	38.36	38.35

Figure 9.11 shows (a) the original time series [latex]x_{1}, \, x_{2}, \, \ldots, \, x_{62}[/latex] as points ([latex]\bullet[/latex]) connected by lines, (b) the first 12 forecasted temperatures [latex]\hat{X}_{63}, \, \hat{X}_{64}, \, \ldots , \, \hat{X}_{74}[/latex] as open circles ([latex]\circ[/latex]), and (c) the 95% prediction intervals as a shaded region. There are three key observations concerning this figure.

Even though the last five observations in the time series [latex]x_{58}, \, x_{59}, \, \ldots, \, x_{62}[/latex] show an increasing trend, the forecasts, which are a function only of [latex]x_n = x_{62} = 38.07[/latex], monotonically approach [latex]\hat{\mu} = \bar x = 37.90[/latex].
The widths of the prediction intervals increase as the time horizon h increases. These widths do not increase indefinitely, but rather approach a limit as [latex]h \rightarrow \infty[/latex].
The random sampling variability which is evident in the observed time series values [latex]x_1, \, x_2, \, \ldots, \, x_{62}[/latex] is not apparent in the forecasted values [latex]\hat{X}_{63}, \, \hat{X}_{64}, \, \ldots , \, \hat{X}_{74}[/latex]. Observed time series values tend to exhibit the typical random sampling variability; forecasted values for a shifted AR(1) model with [latex]0 < \phi < 1[/latex] tend to be smooth.

A time series plot of the Wisconsin beaver temperature from the first 74 readings, with the 95 percent prediction intervals. — Figure 9.11: Wisconsin beaver forecasted temperatures and 95% prediction intervals.

Long Description for Figure 9.11

In the time series plot, the horizontal axis t ranges from 1 to 62. The vertical axis x subscript t ranges from 37.4 to 38.4 in increments of 0.2 degrees Celsius. A horizontal line is drawn at 37.9 degrees Celsius. The graph behaves in a spike manner with solid dots for the first 62 t values. The first 19 values lie above the horizontal line, ranging between 37.9 and 38.2 degrees Celsius. The next eight values decrease from 37.8 to 37.6 degrees Celsius and increase to a peak o 38.4 degrees Celsius at t equals 30. It then decreases below the horizontal line and the next 10 values ranging between 37.6 and 37.8 degrees Celsius, and again increases to 38.1. It then decreases to a minimum temperature o 37.4 degrees at t equals 52, and increases to 38.05 at t equals 62. From t equals 63 to 74, there are open circles following a decreasing trend, approaching 37.9 degrees Celsius. The prediction interval for t-values 63 to 74 ranges between 37.6 and 38.3, and the region is shaded. All data are estimated.

This subsection has introduced the AR(1) time series model. The key results for an AR(1) model are listed below.

The standard AR(1) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = ϕ X_{t - 1} + Z_{t} and (1 - ϕ B) X_{t} = Z_{t}, \end{array}$

where [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex] and [latex]\sigma _ Z ^ {\, 2} > 0[/latex].
The shifted AR(1) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} - μ = ϕ (X_{t - 1} - μ) + Z_{t} and (1 - ϕ B) (X_{t} - μ) = Z_{t} . \end{array}$
The AR(1) model is always invertible; the AR(1) model is stationary for [latex]-1 < \phi < 1[/latex].
The stationary shifted AR(1) model can be written as an MA(∞) model for [latex]-1 < \phi < 1[/latex] as
$\begin{array}{l} X_{t} = μ + Z_{t} + ϕ Z_{t - 1} + ϕ^{2} Z_{t - 2} + \dots . \end{array}$
The AR(1) population autocorrelation function is [latex]\rho(k) = \phi ^ k[/latex] for [latex]-1 < \phi < 1[/latex] and [latex]k = 1, \, 2, \, \ldots[/latex].
The AR(1) population partial autocorrelation function at lag one is [latex]\rho ^ * (1) = \phi[/latex] for [latex]-1 < \phi < 1[/latex] and [latex]\rho ^ * (k) = 0[/latex] for [latex]k = 2, \, 3, \, \ldots[/latex].
The three parameters in the shifted AR(1) model, μ, [latex]\phi[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex], can be estimated from a realization of a time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] by the method of moments, least squares, and maximum likelihood. The point estimators for μ, [latex]\phi[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] are denoted by [latex]\hat \mu[/latex], [latex]\hat \phi[/latex], and [latex]\hat{\sigma} _ Z ^ {\, 2}[/latex], and are typically paired with asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence intervals.
The forecast value [latex]\hat{X} _ {n + h} = \hat \mu + \hat \phi ^ h \left( x_n - \hat{\mu} \right)[/latex] for an AR(1) model approaches [latex]\hat \mu = \bar x[/latex] as [latex]h \rightarrow \infty[/latex]. The associated prediction intervals have widths that increase as h increases and approach a limit as the time horizon [latex]h \rightarrow \infty[/latex].

If the time series of interest is the daily high temperatures in July in Tuscaloosa, then an AR(1) model would be appropriate if tomorrow’s daily high temperature (X_t) can be modeled as a linear function of

today’s high temperature ([latex]X_{t-1}[/latex]), and
a random shock (Z_t).

But what if weather had more of a memory than just one day? What if tomorrow’s daily high temperature (X_t) is better modeled as a linear function of

today’s high temperature ([latex]X_{t-1}[/latex]),
yesterday’s high temperature ([latex]X_{t-2}[/latex]), and
a random shock (Z_t).

This is an example of the thinking that lies behind the AR(2) model, which is introduced in the next section.

9.1.2 The AR(2) Model

The second-order autoregressive model, denoted by AR(2), can be used for modeling a stationary time series in instances in which the current value of the time series is a linear combination of the two previous values plus a random shock. The mathematics associated with the AR(2) model is somewhat more difficult than that associated with the AR(1) model.

There are three parameters that define an AR(2) model: the real-valued coefficients [latex]\phi_1[/latex] and [latex]\phi_2[/latex], and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. The AR(2) model can be written more compactly in terms of the backshift operator B as

$\begin{array}{l} ϕ (B) X_{t} = Z_{t}, \end{array}$

where [latex]\phi(B)[/latex] is the second-order polynomial

$\begin{array}{l} ϕ (B) = 1 - ϕ_{1} B - ϕ_{2} B^{2} . \end{array}$

The AR(2) model has the form of a multiple linear regression model with two independent variables and no intercept term. The current value X_t is modeled as a linear combination of the two previous values of the time series, [latex]X_{t-1}[/latex] and [latex]X_{t - 2}[/latex], plus a white noise term. The parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex] control the inclination of the regression plane in three-dimensional space. The parameter [latex]\sigma _ Z ^ {\, 2}[/latex] reflects the magnitude of the dispersion of the time series values from the regression plane.

To illustrate the thinking behind the AR(2) model in a specific context, let X_t represent the annual return of a particular stock market index in year t. The AR(2) model indicates that the annual return in year t equals [latex]\phi_1[/latex] multiplied by the previous year’s annual return ([latex]\phi_1 X_{t - 1}[/latex]), plus [latex]\phi_2[/latex] multiplied by the annual return two years prior ([latex]\phi_2 X_{t - 2}[/latex]), plus the year t random white noise term Z_t.

Stationarity

Theorem 8.3 indicates that all AR(2) models are invertible, but are stationary when the roots of

$\begin{array}{l} ϕ (B) = 1 - ϕ_{1} B - ϕ_{2} B^{2} \end{array}$

lie outside of the unit circle in the complex plane. Let B₁ and B₂ denote these two roots. Using the quadratic equation, the two roots are

$\begin{array}{l} B_{1} = \frac{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{- 2 ϕ_{2}} and B_{2} = \frac{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{- 2 ϕ_{2}} . \end{array}$

Since [latex]\phi(B_1) = \phi(B_2) = 0[/latex], the quadratic function [latex]\phi(B)[/latex] can also be written in factored form as

$\begin{array}{l} ϕ (B) = (1 - B_{1}^{- 1} B) (1 - B_{2}^{- 1} B) . \end{array}$

Equating the two versions of [latex]\phi(B)[/latex] above and matching coefficients results in

$\begin{array}{l} ϕ_{1} = B_{1}^{- 1} + B_{2}^{- 1} and ϕ_{2} = - (B_{1} B_{2})^{- 1} . \end{array}$

These two equations define the mapping from the complex plane, which contains the roots B₁ and B₂, to the plane that contains the AR(2) parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex]. To find the stationary region, we must find the mapping of the part of the complex plane outside of the unit circle to the [latex](\phi_1, \, \phi_2)[/latex] plane. The mapping yields a triangular-shaped stationary region, as specified in the following result.

Proof The three cases considered below are based on whether the discriminant in the roots of [latex]\phi(B) = 0[/latex] is zero (two identical real roots), positive (two distinct real roots), or negative (two complex roots).

Case 1: Two identical real roots ([latex]\phi_1^2 + 4 \phi_2 = 0[/latex]). This is a concave-down parabola through the origin in the ([latex]\phi_1, \, \phi_2[/latex]) plane. The single real root is

$\begin{array}{l} B_{1} = B_{2} = - \frac{ϕ_{1}}{2 ϕ_{2}} = \frac{ϕ_{1}}{ϕ_{1}^{2} / 2} = \frac{2}{ϕ_{1}} . \end{array}$

Since this single real root must lie outside of the unit circle for a stationary model,

$\begin{array}{l} | \frac{2}{ϕ_{1}} | > 1 \Rightarrow - 2 < ϕ_{1} < 2. \end{array}$

This portion of the parabola is the leftmost graph in Figure 9.12.

Figure 9.12: Partitioning the stationary region for the AR(2) time series model.

Long Description for Figure 9.12

In all three graphs, the horizontal axis phi 1 ranges from negative 2 to 2 in increments of 1 unit. The vertical axis phi 2 ranges from negative 1 to 1 in increments of 1 unit. The first graph is for case 1 which has one real root, phi 1 squared plus 4 phi 2 equals 0. The graph shows a concave down parabola, beginning from (negative 2, negative 1) with an open dot, reaching its maximum at the origin, and ending at (2, negative 1) with an open dot. The second graph is for case 2 which has two distinct real roots, phi 1 squared plus 4 phi 2 greater than 0. The graph shows a concave down parabola that begins from (negative 2, negative 1) with a maximum at (0, 0) and ending at (2, negative 1). A line increases from (negative 2, negative 1) to (0, 1), and then decreases to (2 1). The region above the parabola bounded by the lines is shaded. The third graph is for case 3 which has two complex roots, phi 1 squared plus 4 phi 2 less than 0. The graph shows a concave down parabola that begins from (negative 2, negative 1) with a maximum at (0, 0) and ending at (2, negative 1). A solid horizontal line is drawn from (negative 2, negative 1) to (2, negative 1). The region above the line below the parabola is shaded.

Case 2: Two distinct real roots ([latex]\phi_1^2 + 4 \phi_2 > 0[/latex]). This is the region above the parabola [latex]\phi_1^2 + 4 \phi_2 = 0[/latex] in the ([latex]\phi_1, \, \phi_2[/latex]) plane. Since B₁ and B₂ are real-valued, the conditions for stationarity [latex]|B_1| > 1[/latex] and [latex]|B_2| > 1[/latex] that correspond to having both B₁ and B₂ falling outside of the unit circle are equivalent to

$\begin{array}{l} - 1 < \frac{1}{B_{1}} < 1 and - 1 < \frac{1}{B_{2}} < 1. \end{array}$

The reciprocals of the roots of [latex]\phi(B) = 0[/latex] are

$\begin{array}{l} \frac{1}{B_{1}} = [\frac{- 2 ϕ_{2}}{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}] \cdot [\frac{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}] = \frac{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{2} \end{array}$

and

$\begin{array}{l} \frac{1}{B_{2}} = [\frac{- 2 ϕ_{2}}{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}] \cdot [\frac{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}] = \frac{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{2} . \end{array}$

Since B₁ and B₂ are real and distinct, stationarity is achieved when

$\begin{array}{l} - 1 < \frac{1}{B_{2}} < \frac{1}{B_{1}} < 1 \end{array}$

or

$\begin{array}{l} - 1 < \frac{ϕ_{1} - \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{2} < \frac{ϕ_{1} + \sqrt{ϕ_{1}^{2} + 4 ϕ_{2}}}{2} < 1. \end{array}$

Writing the leftmost inequality as [latex]\sqrt{\phi_1^2 + 4 \phi_2} < \phi_1 + 2[/latex], squaring both sides of this inequality, and simplifying gives

$\begin{array}{l} ϕ_{2} - ϕ_{1} < 1. \end{array}$

Applying a similar approach to the rightmost inequality gives

$\begin{array}{l} ϕ_{1} + ϕ_{2} < 1. \end{array}$

So for the AR(2) model in the case of [latex]\phi(B)[/latex] having two distinct real roots, the model is stationary when the three inequalities

$\begin{array}{l} ϕ_{1}^{2} + 4 ϕ_{2} > 0, ϕ_{2} - ϕ_{1} < 1, and ϕ_{1} + ϕ_{2} < 1 \end{array}$

are satisfied. This region is shaded in the middle graph in Figure 9.12.

Case 3: Two complex conjugate roots ([latex]\phi_1^2 + 4 \phi_2 < 0[/latex]). This is the region below the parabola [latex]\phi_1^2 + 4 \phi_2 = 0[/latex] in the ([latex]\phi_1, \, \phi_2[/latex]) plane. For the model to be stationary, B₁ and B₂ must lie outside of the unit circle in the complex plane. Since the roots are complex conjugates, [latex]|B_1| = |B_2|[/latex], which is calculated as

$\begin{array}{l} | B_{1} | = \sqrt{{(\frac{ϕ_{1}}{- 2 ϕ_{2}})}^{2} + {(\frac{\sqrt{- ϕ_{1}^{2} - 4 ϕ_{2}}}{- 2 ϕ_{2}})}^{2}} = \sqrt{\frac{- 4 ϕ_{2}}{4 ϕ_{2}^{2}}} = \sqrt{- \frac{1}{ϕ_{2}}}, \end{array}$

which is greater than 1 for [latex]-1 < \phi_2 < 0[/latex]. (The imaginary part of the discriminant was negated to avoid taking the square root of a negative number because the discriminant is negative in Case 3.) The region associated with the inequalities

$\begin{array}{l} ϕ_{1}^{2} + 4 ϕ_{2} < 0 and ϕ_{2} > - 1 \end{array}$

is the shaded region in the rightmost graph in Figure 9.12.

The union of these three regions is the interior of the triangular region described by the three inequalities in Theorem 9.9, which proves the result. [latex]\Box[/latex]

Population Autocorrelation Function

Now that the stationary region for an AR(2) time series model has been established, we turn to the derivation of the population autocorrelation function. Assuming that the parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex] fall in the stationary region, the AR(2) model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} \end{array}$

can be multiplied by [latex]X_{t - k}[/latex] to give

$\begin{array}{l} X_{t} X_{t - k} = ϕ_{1} X_{t - 1} X_{t - k} + ϕ_{2} X_{t - 2} X_{t - k} + Z_{t} X_{t - k} . \end{array}$

Taking the expected value of both sides of this equation results in the recursive equation

$\begin{array}{l} γ (k) = ϕ_{1} γ (k - 1) + ϕ_{2} γ (k - 2) \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex] because Z_t has expected value zero and is independent of [latex]X_{t - k}[/latex]. Dividing both sides of this equation by [latex]\gamma(0) = V \left[ X_t \right][/latex] gives the recursive equation

$\begin{array}{l} ρ (k) = ϕ_{1} ρ (k - 1) + ϕ_{2} ρ (k - 2) \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex]. These linear equations, whether written in terms of [latex]\gamma(k)[/latex] or [latex]\rho(k)[/latex], are known in time series analysis as the Yule–Walker equations after British statisticians George Udny Yule and Sir Gilbert Walker. Once the first two values of [latex]\gamma(k)[/latex] or [latex]\rho(k)[/latex] are known, these recursive equations can be used to calculate subsequent values. The next two paragraphs focus on determining the first two values of [latex]\gamma(k)[/latex] and [latex]\rho(k)[/latex], respectively.

For a stationary AR(2) time series model, we derive expressions for [latex]\gamma(0)[/latex] and [latex]\gamma(1)[/latex]. The AR(2) model is

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} . \end{array}$

Squaring both sides of this equation and taking the expected value of both sides gives

$\begin{array}{l} γ (0) = ϕ_{1}^{2} γ (0) + ϕ_{2}^{2} γ (0) + σ_{Z}^{2} + 2 ϕ_{1} ϕ_{2} γ (1) . \end{array}$

Using the symmetry of the population autocovariance function, the Yule–Walker equation with [latex]k = 1[/latex] is

$\begin{array}{l} γ (1) = ϕ_{1} γ (0) + ϕ_{2} γ (1) \Rightarrow γ (1) = \frac{ϕ_{1} γ (0)}{1 - ϕ_{2}} . \end{array}$

Replacing this expression for [latex]\gamma(1)[/latex] in the previous equation gives

$\begin{array}{l} γ (0) = ϕ_{1}^{2} γ (0) + ϕ_{2}^{2} γ (0) + σ_{Z}^{2} + 2 ϕ_{1} ϕ_{2} \frac{ϕ_{1} γ (0)}{1 - ϕ_{2}} . \end{array}$

Moving all terms involving [latex]\gamma(0)[/latex] to the left-hand side of this equation gives

$\begin{array}{l} γ (0) [1 - ϕ_{1}^{2} - ϕ_{2}^{2} - 2 ϕ_{1}^{2} ϕ_{2} \frac{1}{1 - ϕ_{2}}] = σ_{Z}^{2} . \end{array}$

Solving this equation for [latex]\gamma(0)[/latex],

$\begin{array}{l} γ (0) & = \frac{(1 - ϕ_{2}) σ_{Z}^{2}}{(1 - ϕ_{2}) - (1 - ϕ_{2}) ϕ_{1}^{2} - (1 - ϕ_{2}) ϕ_{2}^{2} - 2 ϕ_{1}^{2} ϕ_{2}} \\ = \frac{(1 - ϕ_{2}) σ_{Z}^{2}}{1 - ϕ_{2} - ϕ_{1}^{2} + ϕ_{1}^{2} ϕ_{2} - ϕ_{2}^{2} + ϕ_{2}^{3} - 2 ϕ_{1}^{2} ϕ_{2}} \\ = \frac{(1 - ϕ_{2}) σ_{Z}^{2}}{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2})} . \end{array}$

An expression for [latex]\gamma(1)[/latex] is

$\begin{array}{l} γ (1) = \frac{ϕ_{1} γ (0)}{1 - ϕ_{2}} = \frac{ϕ_{1} σ_{Z}^{2}}{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2})} . \end{array}$

These two values can be used as arguments in the Yule–Walker equations to obtain subsequent values for [latex]\gamma(k)[/latex].

We now turn to the problem of finding [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex]. The first two Yule–Walker equations in terms of [latex]\rho(k)[/latex] are

$\begin{array}{l} ρ (1) & = ϕ_{1} ρ (0) + ϕ_{2} ρ (- 1) \\ ρ (2) & = ϕ_{1} ρ (1) + ϕ_{2} ρ (0) . \end{array}$

Since [latex]\rho(0) = 1[/latex] and [latex]\rho(-k) = \rho(k)[/latex] via Theorem 7.1, these equations reduce to

$\begin{array}{l} ρ (1) & = ϕ_{1} + ϕ_{2} ρ (1) \\ ρ (2) & = ϕ_{1} ρ (1) + ϕ_{2}, \end{array}$

which are easily solved for [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex]:

$\begin{array}{l} ρ (1) = \frac{ϕ_{1}}{1 - ϕ_{2}} and ρ (2) = \frac{ϕ_{1}^{2}}{1 - ϕ_{2}} + ϕ_{2} . \end{array}$

A general formula for [latex]\rho(k)[/latex] exists, but it can involve complex numbers and is unwieldy. An exercise concerning its calculation is given at the end of the chapter. These results are summarized in the following theorem.

We now focus in on the values of [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex]. We can solve for [latex]\phi_1[/latex] and [latex]\phi_2[/latex] in terms of [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex] as

$\begin{array}{l} ϕ_{1} = \frac{ρ (1) (1 - ρ (2))}{1 - ρ (1)^{2}} and ϕ_{2} = \frac{ρ (2) - ρ (1)^{2}}{1 - ρ (1)^{2}} . \end{array}$

These equations can be helpful in the three settings described below.

These equations bear some practical use in that the first two sample autocorrelation function values, r₁ and r₂, can be calculated from a time series and used as approximations for[latex]\rho(1)[/latex] and [latex]\rho(2)[/latex], yielding estimates for [latex]\phi_1[/latex] and [latex]\phi_2[/latex]. These can in turn be used as initial estimates for finding point estimates for [latex]\phi_1[/latex] and [latex]\phi_2[/latex] by, for example, least squares or maximum likelihood estimation, should numerical methods be required.
Level surfaces (that is, contours) in the triangular-shaped stationary region from Theorem 9.9 can be determined by fixing values of [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex]. As an illustration, consider [latex]\rho(1) = 0[/latex]. In this case [latex]\phi_1 = 0[/latex] and [latex]\phi_2 = \rho(2)[/latex], which is a line segment in the stationary region. Continuing in this fashion for several fixed values of [latex]\rho(1)[/latex] [with varying values of [latex]\rho(2)[/latex]] and then for several fixed values of [latex]\rho(2)[/latex] [with varying values of [latex]\rho(1)[/latex]] results in the graph of the stationary region with the level surfaces included shown in Figure 9.13. The level surfaces associated with fixed values of [latex]\rho(1)[/latex] are lines; the level surfaces associated with fixed values of [latex]\rho(2)[/latex] are curves.

Figure 9.13: Stationary region for an AR(2) time series model with level surfaces.

Long Description for Figure 9.13

The horizontal axis measures phi 1 and ranges from negative 2 to 2 in increments of 1 unit. The vertical axis measures phi 2, and ranges from negative 1 to 1 in increments of 1 unit. A triangle is drawn with the apex (0, 1), and the bottom vertices (negative 2, negative 1) and (2, negative 1). The value rho of 1 equals 0 corresponds to the 0 value of the phi 2 function. The rho of 1 values equals 0.2 and 0.4 lies between phi values negative 1 and 0. The rho of 1 values 0.6 and 0.8 lies between phi 1 values negative 1 and negative 2. Similarly, the values of rho lie on the right side of 0. From the apex (0, 1), the triangular surface levels are drawn and meet the horizontal line at each rho of 1 value. Nine concave parabolas are drawn inside the triangular surface levels, each having a maximum at phi 2 value 0. Their maxima are represented by rho of 2 values that range from negative 0.8 to 0.8 in increments of 0.2.
Since [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex] are population correlations, the obvious constraints on their values for an AR(2) time series model are [latex]-1 < \rho(1) < 1[/latex] and [latex]-1 < \rho(2) < 1[/latex]. Additionally, since [latex]\phi_2 > -1[/latex] in order to fall into the triangular-shaped stationary region defined in Theorem 9.9 for the AR(2) time series model,
$\begin{array}{l} ϕ_{2} > - 1 \Rightarrow \frac{ρ (2) - ρ (1)^{2}}{1 - ρ (1)^{2}} > - 1 \Rightarrow ρ (2) > 2 ρ (1)^{2} - 1. \end{array}$

The boundary of this third constraint is a parabola in the [latex]\big( \rho(1), \, \rho(2) \big)[/latex] plane. The shaded region in Figure 9.14 shows the [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex] values that are associated with stationary AR(2) time series models. Unlike the AR(1) population autocorrelation function, it is possible to achieve a stationary model with [latex]|\rho(2)| > |\rho(1)|[/latex]. The AR(2) population autocorrelation function values are not necessarily monotonically decreasing in magnitude as they were in the AR(1) model.

A graph shows a shaded parabolic region on the axes set of rho of 1 and rho of 2. The horizontal axis rho of 1 and the vertical axis rho of 2 range from negative 1 to 1 in increments of 1 unit. The concave parabola decreases from (negative 1, 1), reaches its minimum at (0, negative 1), and increases to (1, 1). A horizontal line connects (negative 1, 1) and (1, 1). The region above the line and below the parabola is shaded. — Figure 9.14: Values of [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex] associated with a stationary AR(2) model.

Population Partial Autocorrelation Function

We now determine the population partial autocorrelation function for an AR(2) model. Using Definition 7.4, the population lag 0 partial autocorrelation is [latex]\rho ^ * (0) = 1[/latex]. The population lag 1 partial autocorrelation is [latex]\rho ^ * (1) = \rho (1) = \phi_1 / (1 - \phi_2)[/latex]. After evaluating the determinants and simplifying, the population lag 2 partial autocorrelation is

$\begin{array}{l} ρ^{*} (2) = \frac{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & ρ (2) \end{array} |}{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & 1 \end{array} |} = \frac{| \begin{array}{cc} 1 & ϕ_{1} / (1 - ϕ_{2}) \\ ϕ_{1} / (1 - ϕ_{2}) & (ϕ_{1}^{2} - ϕ_{2}^{2} + ϕ_{2}) / (1 - ϕ_{2}) \end{array} |}{| \begin{array}{cc} 1 & ϕ_{1} / (1 - ϕ_{2}) \\ ϕ_{1} / (1 - ϕ_{2}) & 1 \end{array} |} = ϕ_{2} . \end{array}$

Appealing to the Yule–Walker equations from Theorem 9.10 to define the third column of the determinant of the numerator, the population lag 3 partial autocorrelation is

$\begin{array}{l} ρ^{*} (3) = \frac{| \begin{array}{ccc} 1 & ρ (1) & ρ (1) \\ ρ (1) & 1 & ρ (2) \\ ρ (2) & ρ (1) & ρ (3) \end{array} |}{| \begin{array}{ccc} 1 & ρ (1) & ρ (2) \\ ρ (1) & 1 & ρ (1) \\ ρ (2) & ρ (1) & 1 \end{array} |} = \frac{| \begin{array}{ccc} 1 & ρ (1) & ϕ_{1} + ϕ_{2} ρ (1) \\ ρ (1) & 1 & ϕ_{1} ρ (1) + ϕ_{2} \\ ρ (2) & ρ (1) & ϕ_{1} ρ (2) + ϕ_{2} ρ (1) \end{array} |}{| \begin{array}{ccc} 1 & ρ (1) & ρ (2) \\ ρ (1) & 1 & ρ (1) \\ ρ (2) & ρ (1) & 1 \end{array} |} = 0. \end{array}$

The determinant in the numerator is zero because the third column is a linear combination of the first two columns. This pattern continues for the higher lags. When computing [latex]\rho^*(k)[/latex] for [latex]k = 3, \, 4, \, \ldots ,[/latex] the first, second, and last columns of the matrix in the numerator are

$\begin{array}{l} [\begin{array}{c} 1 \\ ρ (1) \\ ρ (2) \\ ⋮ \\ ρ (k - 1) \end{array}], [\begin{array}{c} ρ (1) \\ 1 \\ ρ (1) \\ ⋮ \\ ρ (k - 2) \end{array}], and [\begin{array}{c} ϕ_{1} + ϕ_{2} ρ (1) \\ ϕ_{1} ρ (1) + ϕ_{2} \\ ϕ_{1} ρ (2) + ϕ_{2} ρ (1) \\ ⋮ \\ ϕ_{1} ρ (k - 1) + ϕ_{2} ρ (k - 2) \end{array}] . \end{array}$

The last column of the matrix in the numerator is a linear combination of the first two columns. The matrix in the numerator of the calculation of [latex]\rho ^ * (k)[/latex] is singular, which means that its determinant is zero. This constitutes a proof of the following result.

The population partial autocorrelation function for the AR(2) model cuts off after lag 2. A graph of the sample partial autocorrelation function (that is, a graph of [latex]r^*_k[/latex] for the first few values of k), should also cut off after lag 2 if the AR(2) model is appropriate. This sample partial autocorrelation function shape is easier to recognize than the associated sample autocorrelation function shape because cutting off is easier to recognize than tailing off in the presence of random sampling variability.

A careful inspection of Theorem 9.11 reveals that the signs of [latex]\phi_1[/latex] and [latex]\phi_2[/latex] match the signs of [latex]\rho^*(1)[/latex] and [latex]\rho^*(2)[/latex], respectively:

$\begin{array}{l} sgn (ϕ_{1}) = sgn (ρ^{*} (1)) and sgn (ϕ_{2}) = sgn (ρ^{*} (2)) \end{array}$

for [latex]\phi_1[/latex] and [latex]\phi_2[/latex] falling in the triangular-shaped stationary region. Figure 9.15 shows the stationary region from Theorem 9.9, along with plots of the representative population autocorrelation function and population partial autocorrelation function. There are four points, one in each quadrant, that are plotted. The population autocorrelation function and the population partial autocorrelation function associated with those four points are plotted in each of the quadrants. As expected, the signs of the values of and [latex]\phi_1[/latex] and [latex]\rho^*(1)[/latex] match and the signs of the values of [latex]\phi_2[/latex] and [latex]\rho^*(2)[/latex] match. The quadrant in the stationary region determines the signs of [latex]\rho^*(1)[/latex] and [latex]\rho^*(2)[/latex], as illustrated by the four representative population partial autocorrelation functions graphed in Figure 9.15. As you can see by inspecting the shapes of [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] from Figure 9.15, the addition of the parameter ϕ₂ in the transition from the AR(1) model to the AR(2) model results in significant additional modeling capability. The following observations can be gleaned from Figure 9.15.

- As expected, all population partial autocorrelation functions cut off after lag two.
- When [latex]\phi(B)[/latex] has real roots, the population autocorrelation function consists of mixtures of damped exponentials.

- When [latex]\phi(B)[/latex] has complex roots, the population autocorrelation function has a damped sinusoidal shape.

A graph of the stationary region with triangular and convex parabolic surfaces for real and complex roots in four quadrants. — Figure 9.15: Stationary AR(2) time series model signature [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] shapes.

Long Description for Figure 9.15

The horizontal axis phi 1 ranges from negative 2 to 2 in increments of 1 unit. The vertical axis phi 2 ranges from negative 1 to 1 in increments of 1 unit. A triangle is drawn on the plane with the vertices (negative 2, negative 1), (0, 1), and (2, negative 1). A concave down parabola begins from (negative 2, negative 1), reaches a maximum at (0, 0), and decreases to (2, negative 1). A point is plotted in each of the four quadrants. The real roots lie above the parabola region inside the triangular region in quadrants 1 and 2. In quadrant 1, the rho of k values decreases progressively. The rho star of k has two decreasing positive values. In quadrant 2, the rho of k values alternates between positive and negative with decreasing magnitude. The rho star of k values alternates between positive and negative. The complex roots lie in the parabolic region in quadrants 3 and 4. In quadrant 3, the rho of k follows a damped sinusoidal fashion and the rho star of k has negative values for the first three k values. In quadrant 4, the rho of k values follows a damped sinusoidal fashion. The rho star of k has a positive and a negative value.

The population autocorrelations on the tiny inset plots of [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] in Figure 9.15 can be calculated using the recursive relationships from Theorem 9.10 [for [latex]\rho(k)[/latex]] and Theorem 9.11 [for [latex]\rho^*(k)[/latex]]. They can also be calculated using the R ARMAacf function. Consider the two inset plots in the fourth quadrant of the graph in Figure 9.15, for example, that correspond to [latex]\phi_1 = 1.5[/latex] and [latex]\phi_2 = -0.7[/latex]. The graph of the first 20 lags of [latex]\rho(k)[/latex] can be plotted with the R command

Likewise, the graph of the first 20 lags of [latex]\rho^*(k)[/latex] can be plotted with

The ar argument defines the [latex]\phi_1[/latex] and [latex]\phi_2[/latex] parameters of the AR(2) model, the ma argument is set to zero to indicate that there are no moving average terms in the AR(2) model, the lag.max argument is set to 20 to return the first 20 autocorrelations, and the type argument in the call to plot is set to "h" in order to graph the autocorrelations as spikes rather than points.

As was the case with the AR(1) time series model, the AR(2) time series can be written as an MA(∞) time series model. This alternative representation can be useful for deriving certain quantities associated with the AR(2) model, in particular, standard errors of forecasted values. The first form of a general linear model, which is equivalent to an MA(∞) model, is

$\begin{array}{l} X_{t} = Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots . \end{array}$

Our goal is to determine the values of θ₁, θ₂, … that correspond to fixed parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex]. Since the MA(∞) model is valid at time t, it is also valid at times [latex]t-1[/latex] and [latex]t-2[/latex]:

$\begin{array}{l} X_{t - 1} = Z_{t - 1} + θ_{1} Z_{t - 2} + θ_{2} Z_{t - 3} + \dots \end{array}$

and

$\begin{array}{l} X_{t - 2} = Z_{t - 2} + θ_{1} Z_{t - 3} + θ_{2} Z_{t - 4} + \dots . \end{array}$

So the AR(2) time series model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} \end{array}$

as established in Definition 9.2, can be rewritten as

$\begin{array}{l} Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots = ϕ_{1} (Z_{t - 1} + θ_{1} Z_{t - 2} + θ_{2} Z_{t - 3} + \dots) + ϕ_{2} (Z_{t - 2} + θ_{1} Z_{t - 3} + θ_{2} Z_{t - 4} + \dots) + Z_{t} . \end{array}$

Equating the coefficients of [latex]Z_{t - 1}[/latex] gives

$\begin{array}{l} θ_{1} = ϕ_{1} . \end{array}$

Equating the coefficients of [latex]Z_{t - 2}[/latex] gives

$\begin{array}{l} θ_{2} = ϕ_{1} θ_{1} + ϕ_{2} = ϕ_{1}^{2} + ϕ_{2} . \end{array}$

Equating the coefficients of [latex]Z_{t - k}[/latex] gives the recursive equation

$\begin{array}{l} θ_{k} = ϕ_{1} θ_{k - 1} + ϕ_{2} θ_{k - 2} \end{array}$

for [latex]k = 3, \, 4, \, \ldots[/latex].

An exercise at the end of the chapter highlights other methods for calculating the coefficients θ₁, θ₂, … in the MA(∞) model which is equivalent to the stationary AR(2) model.

Example 9.11 Calculate the first six coefficients of an MA(∞) model associated with the stationary AR(2) model with [latex]\phi_1 = 1[/latex] and [latex]\phi_2 = -1 / 2[/latex].

The AR(2) model is stationary because the point [latex](\phi_1, \, \phi_2) = (1, \, - 1 / 2)[/latex] falls in the triangular-shaped stationary region defined by the inequalities in Theorem 9.9. Using Theorem 9.12, the first six coefficients of the MA(∞) model are

$\begin{array}{l} θ_{1} = 1, θ_{2} = \frac{1}{2}, θ_{3} = 0, θ_{4} = - \frac{1}{4}, θ_{5} = - \frac{1}{4}, θ_{6} = - \frac{1}{8} . \end{array}$

These coefficients can also be calculated in R with the ARMAtoMA function. The statement

returns the same coefficients calculated above. The ar argument defines the [latex]\phi_1[/latex] and [latex]\phi_2[/latex] parameters of the AR(2) model, the ma argument is set to zero to indicate that there are no moving average terms in the AR(2) model, and the lag.max argument is set to 6 in order to calculate the first six coefficients in the MA(∞) model and return these values in a vector in R.

The Shifted AR(2) Model

For a stationary AR(2) model expressed as an MA(∞) model, it is clear that [latex]E \left[ X_t \right] = 0[/latex]. This model is not of much use in practice because most real-world time series are not centered around zero. Adding a shift parameter μ overcomes this shortcoming. Since population variance and covariance are unaffected by a shift, the associated population autocorrelation and partial autocorrelation functions remain the same as those given in Theorems 9.10 and 9.11.

The shifted AR(2) model can be written in terms of the backshift operator B as

$\begin{array}{l} ϕ (B) (X_{t} - μ) = Z_{t}, \end{array}$

where [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B^2[/latex]. The practical problem of fitting a shifted AR(2) model to an observed time series of n values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] will be illustrated later in this subsection.

Simulation

An AR(2) time series can be simulated by appealing to the defining formula for the AR(2) model. Iteratively applying the defining formula for a standard AR(2) model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} \end{array}$

from Definition 9.2 results in the simulated values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]. The primary difficult aspect of devising a simulation algorithm is generating the first two values, X₁ and X₂. For simplicity, assume that the white noise terms are Gaussian white noise terms. There are two approaches to overcome this initialization problem. The first approach generates X₁ and X₂ from a bivariate normal distribution with population mean vector [latex]\boldsymbol{0} = (0, \, 0) ^ \prime[/latex] and variance–covariance matrix

$\begin{array}{l} Σ = [\begin{array}{cc} γ (0) & γ (1) \\ γ (1) & γ (0) \end{array}] = \frac{σ_{Z}^{2}}{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2})} [\begin{array}{cc} 1 - ϕ_{2} & ϕ_{1} \\ ϕ_{1} & 1 - ϕ_{2} \end{array}] \end{array}$

via Theorem 9.10. Notice that in the special case of [latex]\phi_1 = \phi_2 = 0[/latex] this matrix reduces to the variance–covariance matrix for Gaussian white noise, which is [latex]I \sigma _ Z ^ {\, 2}[/latex]. The algorithm given as pseudocode below generates initial time series observations X₁ and X₂ as indicated above, and then uses an additional [latex]n - 2[/latex] Gaussian white noise terms [latex]Z_3, \, Z_4, \, \ldots, \, Z_n[/latex] to generate the remaining time series values [latex]X_3, \, X_4, \, \ldots, \, X_n[/latex] using the AR(2) defining formula from Definition 9.2. Indentation denotes nesting in the algorithm.

The four-parameter shifted AR(2) time series model which includes a population mean parameter μ can be simulated by simply adding μ to each time series observation generated by this algorithm. The next example implements this algorithm in R.

Example 9.12 Generate a realization of [latex]n = 100[/latex] observations from an AR(2) time series model with [latex]\phi_1 = 1.5[/latex], [latex]\phi_2 = - 0.7[/latex], and Gaussian white noise error terms with [latex]\sigma _Z ^ {\, 2} = 16[/latex].

Since [latex](\phi_1, \, \phi_2) = (1.5, -0.7)[/latex] lies in the triangular-shaped stationary region defined in Theorem 9.9, the simulated values will be generated from a stationary time series model. The population autocorrelation function [latex]\rho(k)[/latex] and the population partial autocorrelation function [latex]\rho ^ *(k)[/latex] are displayed in the fourth quadrant of Figure 9.15, and we expect similar shaped functions r_k and [latex]r^*_k[/latex] from our simulated values. The optional first statement in the R code below uses the set.seed function to establish the random number seed. The second and third statements set the AR(2) coefficients to [latex]\phi_1 = 1.5[/latex] and [latex]\phi_2 = -0.7[/latex]. The fourth statement sets the standard deviation of the Gaussian white noise to [latex]\sigma _ Z = 4[/latex]. The fifth statement places the variance–covariance matrix of X₁ and X₂ in the [latex]2 \times 2[/latex] matrix sigma. The sixth statement sets the number of simulated values to [latex]n = 100[/latex]. The seventh statement defines the vector x of length [latex]n = 100[/latex] to hold the simulated time series values. The eighth statement uses the mvrnorm function from the MASS package to generate the first two simulated time series observations X₁ and X₂ from the appropriate bivariate normal distribution. Finally, the for loop iterates through the defining formula for the AR(2) model generating the remaining observations [latex]X_3, \, X_4, \, \ldots, \, X_{100}[/latex].

Using the plot.ts function to make a plot of the time series contained in x, the acf function to plot the associated correlogram, the pacf function to plot the associated sample partial autocorrelation function, and the layout function to arrange the graphs as in Example 7.24, the resulting trio of graphs are displayed in Figure 9.16. The sample partial autocorrelation function has statistically significant spikes at lags 1 and 2 with [latex]r ^ * _ 1 = 0.8036[/latex] and [latex]r ^ * _ 2 = -0.6229[/latex], and then cuts off after lag 2 as expected from the population counterparts in Figure 9.15. The approximate 95% confidence intervals indicated by the dashed lines show that the values of the sample partial autocorrelation function do not significantly differ from zero at lags beyond lag 2. The sample autocorrelation function displays a damped sinusoidal shape as expected. The time series plot shows that observations tend to linger on one side of the population mean (indicated by a horizontal line), which is consistent with the two initial statistically significant positive spikes in the sample autocorrelation function. However, the time that the observations linger on one side of the mean is inhibited by the statistically significant negative spikes at lags 4, 5, 6, and 7 in the sample autocorrelation function. There is thus some tug exerted by the time series model to linger on one side of the mean for only a limited amount of time.

A time series plot of 100 simulated values, with the associated autocorrelation and partial correlation functions for the first 15 lags, in the case of phi 1 value 1.5 and phi 2 value negative 0.7. — Figure 9.16: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an AR(2) model.

Long Description for Figure 9.16

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from negative 20 to 20 in increments of 10 units. A horizontal line is drawn at 0. The data forms a spike pattern, beginning below the horizontal line around x subscript t equals negative 5. It increases progressively to an x subscript value of 13 at t equals 12, then decreases to negative 10 at t equals 16. It increases to a peak of 22 at t equals 22, decreases to the lowest point of negative 20 at t equals 29, and then oscillates between x subscript t values of negative 15 and 15 until t equals 78. It finally reaches the highest point of 21 at t equals 96, and decreases to negative 20 at t equals 100. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 1 unit. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Two dashed horizontal lines are drawn at x subscript t values of negative 0.2 and 0.2. The values of the r subscript follow a damped sinusoidal fashion. The first four values are positive, decreasing from 1.0 to 0.02, the next five values are negative, ranging between negative 0.4 and 0.0. The next five values are positive and follow a bell shape, ranging between 0.0 and 0.2, and the last two values are negative 0.02 and negative 0.2. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The first four values of the r star subscript k for k values 0 to 4 are 1.0, 0.8, negative 0.6, and negative 0.15, respectively. The remaining r subscript k values range between negative 0.15 and 0.1. All data are estimated.

We recommend running the simulation code from the previous example several dozen times in a loop and viewing the associated plots of x_t, r_k, and [latex]r_k^*[/latex] in search of patterns. This will allow you to see how various realizations of a simulated AR(2) time series model vary from one realization to the next. So when you then view a single realization of a real-life time series, you will have a sense of how far these plots might deviate from their expected patterns.

There is a second way to overcome the initialization problem in simulating observations from an AR(2) time series. This second technique starts the time series with two initial arbitrary values, and then allows the time series to “warm up” or “burn in” for several time periods before producing the first observation X₁. Reasonable initial arbitrary values for the standard AR(2) model are 0; reasonable initial arbitrary values for the shifted AR(2) model are μ. This is the approach taken by the built-in R function named arima.sim, which simulates a realization of a time series. Using the arima.sim function saves a few keystrokes over the approach taken in the previous example, as illustrated next.

Example 9.13 Generate a realization of [latex]n = 100[/latex] observations from a shifted AR(2) time series model with coefficients [latex]\phi_1 = -1.8[/latex] and [latex]\phi_2 = -0.88[/latex], population mean value [latex]\mu = 10[/latex], and Gaussian white noise error terms with [latex]\sigma _Z ^ {\, 2} = 16[/latex].

Since there is now a nonzero population mean value, the shifted AR(2) model is

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t}, \end{array}$

where [latex]\mu = 10[/latex], [latex]\phi_1 = -1.8[/latex], [latex]\phi_2 = -0.88[/latex], and [latex]\sigma _ Z = 4[/latex]. Since [latex](\phi_1, \, \phi_2) = (-1.8, \, -0.88)[/latex] lies in the triangular-shaped stationary region defined in Theorem 9.9, this is a stationary AR(2) time series model. The [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] values for this model are plotted in the third quadrant of Figure 9.15. The model argument in the arima.sim function is a list containing the values of the coefficients [latex]\phi_1[/latex] and [latex]\phi_2[/latex]. The second argument to arima.sim is n, the number of time series observations to be generated. The sd argument defines the standard deviation of the white noise σ_Z. The n.start argument gives the number of observations in the burn-in period, which we specify here as 50. The R code to generate [latex]n = 100[/latex] values from the shifted AR(2) model is given below.

Figure 9.17 shows the three plots associated with the simulated values in the vector x using the plot.ts, acf, and pacf functions. The time series plot shows a radically different pattern than the time series in the previous example in several aspects. First, this simulated time series is centered around [latex]\mu = 10[/latex] (indicated by a horizontal line) rather than [latex]\mu = 0[/latex]. Second, the time series jumps from one side of the population mean to the other from one observation to the next. This is consistent with the highly statistically significant negative lag 1 sample autocorrelation [latex]r_1 = -0.9546[/latex]. The signs of the initial sample autocorrelation function values alternate, their magnitudes decrease, and sample autocorrelations at subsequent lags follow a damped sinusoidal pattern. In addition, the lag 1 sample autocorrelation is so close to [latex]-1[/latex] that adjacent observations often tend to be about the same distance away from μ. Third, the two statistically significant spikes in the partial autocorrelation function, [latex]r_1^* = -0.9546[/latex] and [latex]r_2^* = -0.7834[/latex], have the same signs as [latex]\phi_1[/latex] and [latex]\phi_2[/latex]. As expected, the partial autocorrelation function cuts off after lag 2.

A time series plot of 100 simulated values, with the associated autocorrelation and partial correlation functions for the first 15 lags for the A R 2 model. — Figure 9.17: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an AR(2) model.

Long Description for Figure 9.17

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from negative 50 to 70 in increments of 30 units. A horizontal line is drawn at 0. The data forms a spike pattern, alternating above and below the horizontal line. The first 54 x subscript t values vacillate between negative 20 and 40, reach a peak of 70 at t equals 59, and then decrease. It again fluctuates between the values 5 and 15 for the next few t values and the last 10 values range between negative 50 and 50. In the first correlogram, the horizontal axis ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Two dashed horizontal lines are drawn at negative 0.2 and 0.2. The r subscript k values alternate between positive and negative with decreasing magnitude. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0, in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The first r star subscript k values are 1.0, negative 1.0, and negative 0.7 for k values 0, 1, and 2, respectively. The remaining r star subscript k values range between negative 0.2 and 0.1. All data are estimated.

The remaining topics associated with the AR(2) time series model are statistical in nature: parameter estimation, model assessment, model selection, and forecasting. A sample time series that will be revisited throughout these topics is introduced next.

Example 9.14 The five Great Lakes that lie along the U.S.–Canada border are Huron, Ontario, Michigan, Erie, and Superior. Their names are easily remembered with the acronym HOMES. The built-in R time series LakeHuron consists of [latex]n = 98[/latex] monthly mean levels (in feet) of the lake level of Lake Huron taken at the Harbor Beach, Michigan water level gauge every July from 1875–1972. The measurements are essentially the number of feet above sea level of Lake Huron over time. Plot the time series, sample autocorrelation function, and sample partial autocorrelation function, and suggest a tentative time series model.

For simplicity, we define time [latex]t = 1[/latex] to be the year 1875 and [latex]t = 98[/latex] to be the year 1972. The time series of levels, the sample autocorrelation function, and the sample partial autocorrelation function can be graphed with the R statements

The trio of graphs is displayed in Figure 9.18. A horizontal line has been added to the time series plot at [latex]\bar x = 579[/latex] feet. A visual assessment of the [latex]n = 98[/latex] observations from the time series reveals that the population mean level might be systematically decreasing relative to sea level over the time period. The tied observations in the years 1925–1926 ([latex]t = 51[/latex] and [latex]t = 52[/latex]), the local minimum in year 1934 ([latex]t = 60[/latex]), and the global minimum in year 1964 ([latex]t = 90[/latex]), along with nearby observations, provide a downward tug on the mean value of the time series as time advances. Alternatively, the initial 15 or so levels might be drawn from a non-representative population. The population variance of the observations in the time series seems to be stable over time.

Figure 9.18: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 98[/latex] levels of Lake Huron (1875–1972).

Long Description for Figure 9.18

In the time series plot, the horizontal axis t ranges from 1 to 98. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 unit. A horizontal line is drawn at x subscript t equals 579. The graph begins with 580.2 at t equals 1, reaches a maximum of 582 at t equals 2, and decreases to the value 577 at t equals 51 with many fluctuations. It then increases to 580.5 at t equals 53, decreases to 576 at t equals 58, and increases to 581 at t equals 76 with many vacillations. It then decreases to the lowest value of 576 at t equals 90 and again increases to 580 at t equals 98. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Two dashed horizontal lines are drawn at negative 0.2 and 0.2. The r subscript k values decrease progressively from 1.0 to 0.05 for k values 0 through 15. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. The values of r star subscript k for k values 0, 1, and 2 are 1.0, 0.7, and negative 0.25, respectively. The r subscript k values for k equals 3 to 9 are positive, ranging between the values 0.0 and 0.15. The r subscript k value at t equals 10 is negative 0.2, and for the remaining k values, the r star subscript k values are positive, ranging between 0.0 and 0.05. All data are estimated.

Now we turn to the sample autocorrelation function and sample partial autocorrelation function. The sample autocorrelation function appears to be tailing out. The initial positive spikes in the sample autocorrelation function are consistent with nearby observations in the time series lingering above and below the sample mean. The sample partial autocorrelation function has a statistically significant positive spike at lag 1, and a marginally significant negative spike at lag 2. The sample partial autocorrelation function does not have any statistically significant spikes after lag 2. These two graphs indicate that a shifted AR(2) model might be a reasonable tentative time series model.

The question concerning the possible downward trend of the level of Lake Huron in the time series remains a stumbling block to an enthusiastic recommendation of the shifted AR(2) model. Perhaps a nonstationary model is appropriate. The list below provides four ways to proceed.

Eliminate the first 15 or so observations from the time series if we can find an assignable cause that made the initial observations higher than the others. Here are some examples of potential assignable causes. Was the measuring equipment, location, procedure, or personnel changed at some point in the time series? There were various bridge, power, and flow control projects conducted during the early part of this time series. Some projects increased flow; others decreased flow. Could these projects account for the increased early observations in the time series? The primary driver of the year-to-year variability in water levels is the regional climate, which is influenced by global oceanic and atmospheric patterns. Lake Huron thermodynamics can be influenced by above-average lake evaporation rates. Might any of these factors account for the increased early observations in the time series? Did the episodic dredging of Lake Huron's outlet, the St. Clair River, result in a lowering of the level of Lake Huron? In the case of the beaver temperature time series from Example 9.3, it was easy to find an assignable cause, because the beaver's temperature was clearly lower when in the lodge than when outside of the lodge. Such an assignable cause might be more difficult to identify in the case of the time series of Lake Huron levels.
If an assignable cause cannot be found, fit a simple linear regression model to the original time series and consider the time series consisting of the original time series values minus the fitted values to be a stationary time series. Figure 9.19 shows the fitted regression line using the model
$\begin{array}{l} Y = β_{0} + β_{1} X + ϵ, \end{array}$

where X is time (which is measured without error), Y is the random and continuous lake level, β₁ is the slope of the regression line, β₀ is the intercept of the regression line, and ϵ is an error term. The hypothesis test

$\begin{array}{l} H_{0} : β_{1} = 0 \end{array}$

versus

$\begin{array}{l} H_{1} : β_{1} \neq 0 \end{array}$

results in a tiny p-value ([latex]p = 4 \cdot 10 ^ {-8}[/latex]), which confirms our visual assessment. There does indeed appear to be a decrease in the level of the water in Lake Huron over time. The estimated regression coefficients and p-values can be calculated in R with the statement

The estimated slope [latex]\hat \beta _ 1 = -0.024[/latex] indicates that the level of Lake Huron is decreasing by an average of about a quarter of an inch annually over this time horizon. Extrapolation of the simple linear regression model outside of the years 1875–1972 is probably not warranted in this setting. The usual regression assumption of independence is clearly violated in this setting because the observations in the time series are autocorrelated.

Figure 9.19: Lake Huron levels (1875–1972) with regression line.

Long Description for Figure 9.19

The horizontal axis t lists years from 1875 to 1972 in increments of 25 years. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 unit. A regression line with a negative slope is drawn from 580.2 to 578 as t increases from 1875 to 1972. Data points are scattered around the regression line, following a downward trend. The first few data points from t equals 1875 to 1925 range between 578 and 582. The remaining data points are scattered along the regression line between 576 and 581, as t increases from 1925 to 1972. All data are estimated.
Again assuming that an assignable cause cannot be identified in order to remove the initial observations, the time series can be differenced (via [latex]x_{i + 1} - x_i[/latex] for [latex]i = 1, \, 2, \, \ldots, \, n - 1[/latex]) in order to remove the possible linear trend. The [latex]n - 1 = 97[/latex] observations in this differenced series can then be evaluated as a stationary time series. This approach is analogous to the simple linear regression approach.
Leave the time series alone, and assume that the early observations being larger than the rest of the observations is due to random sampling variability.

This might seem like a lot of fuss to establish a stationary time series model, but this crucial early detective work is common when trying to formulate a tentative time series model. We take the fourth approach from the list above for now and fit a tentative shifted AR(2) model to the [latex]n = 98[/latex] original observations in the time series. Fitting this tentative model will illustrate all of the steps involved with fitting, evaluating, and applying an AR(2) model: estimating the model parameters by the three techniques (method of moments, least squares, and maximum likelihood estimation), constructing confidence intervals for these point estimators, assessing the validity of the fitted model, performing model selection procedures for the AR(2) model, and forecasting the level of Lake Huron into the future. Later in the chapter, we will re-analyze this time series using a nonstationary time series model.

In conclusion, a preliminary graphical analysis of the [latex]n = 98[/latex] Lake Huron levels suggests a tentative AR(2) time series model should be on the short list. It is worthwhile investigating the possibility of an assignable cause which artificially elevates the initial 15 observations. There is significant concern about nonstationarity, which will be addressed later in the chapter. The next step is to estimate the parameters in the model.

Parameter Estimation

There are four parameters, μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex], to estimate in the shifted AR(2) model

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} . \end{array}$

The three parameter estimation techniques outlined in Section 8.2.1, method of moments, least squares, and maximum likelihood estimation, are applied to the shifted AR(2) time series model next.

Approach 1: Method of moments. In the case of estimating the four parameters in the shifted AR(2) model by the method of moments, we match the population and sample (a) first-order moments, (b) second-order moments, (c) lag 1 autocorrelation, and (d) lag 2 autocorrelation. Placing the population moments on the left-hand side of the equation and the associated sample moments on the right-hand side of the equation results in four equations in four unknowns:

$\begin{array}{l} E [X_{t}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t} \\ E [X_{t}^{2}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ ρ (1) & = r_{1} \\ ρ (2) & = r_{2} . \end{array}$

The expected value of X_t is μ, the expected value of [latex]X_t^2[/latex] can be found by using the shortcut formula for the population variance and by using the value of [latex]\gamma(0) = V[X_t][/latex] from Theorem 9.10, and the values of [latex]\rho(1)[/latex] and [latex]\rho(2)[/latex] are also obtained from Theorem 9.10. So the four equations become

$\begin{array}{r} μ & = & \frac{1}{n} \sum_{t = 1}^{n} X_{t} \\ \frac{(1 - ϕ_{2}) σ_{Z}^{2}}{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2})} + μ^{2} & = & \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ \frac{ϕ_{1}}{1 - ϕ_{2}} & = & r_{1} \\ \frac{ϕ_{1}^{2}}{1 - ϕ_{2}} + ϕ_{2} & = & r_{2} . \end{array}$

Solving these equations for the four unknown parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex] yields closed-form solutions for the method of moments estimators

$\begin{array}{l} \hat{μ} & = \bar{X} \\ {\hat{ϕ}}_{1} & = \frac{r_{1} (1 - r_{2})}{1 - r_{1}^{2}} \\ {\hat{ϕ}}_{2} & = \frac{r_{2} - r_{1}^{2}}{1 - r_{1}^{2}} \\ {\hat{σ}}_{Z}^{2} & = [\frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} - {\hat{μ}}^{2}] \frac{(1 + {\hat{ϕ}}_{2}) (1 + {\hat{ϕ}}_{1} - {\hat{ϕ}}_{2}) (1 - {\hat{ϕ}}_{1} - {\hat{ϕ}}_{2})}{1 - {\hat{ϕ}}_{2}} . \end{array}$

This constitutes a proof of the following result.

These estimators are random variables and have been written as a function of the random time series values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]. For observed time series values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex], the lowercase versions of the formulas will be used. These estimators are often known as the Yule–Walker estimators because their derivation involved the Yule–Walker equations from Theorem 9.10.

Example 9.15 For the time series of [latex]n = 98[/latex] Lake Huron levels from Example 9.14, find the method of moments estimators of μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex] for the AR(2) model.

The R code below calculates and prints the point estimates of the μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex] parameters using the method of moments estimators given in Theorem 9.14.

The point estimates for the unknown parameters computed by this code are

$\begin{array}{l} \hat{μ} = 579.00 {\hat{ϕ}}_{1} = 1.0538 {\hat{ϕ}}_{2} = - 0.2 6675 {\hat{σ}}_{Z}^{2} = 0.49199 . \end{array}$

These point estimates are reported to five digits because the data values were given to five-digit accuracy. The positive value for [latex]\hat \phi _ 1[/latex] and the negative value for [latex]\hat \phi _ 2[/latex] are consistent with the sample partial autocorrelation function in Figure 9.18. Figure 9.20 is analogous to Figure 9.13 but contains just two of the level surfaces associated with the method of moments match on the population and sample autocorrelations at lags 1 and 2: the line associated with [latex]\rho(1) = r_1 = 0.83191[/latex] and the curve associated with [latex]\rho(2) = r_2 = 0.60994[/latex]. These two level surfaces intersect at the point [latex]\left( \hat \phi_1 , \, \hat \phi_2 \right) = \left( 1.0538 , \, -0.26675 \right)[/latex].

A graph of the stationary region with triangular and concave parabolic surfaces for the Lake Huron time series. — Figure 9.20: Level surfaces for the Lake Huron time series in the stationary region.

Long Description for Figure 9.20

The horizontal axis phi 1 ranges from negative 2 to 2 in increments of 1 unit. The vertical axis phi 2 ranges from negative 1 to 1 in increments of 1 unit. A triangle is drawn on the plane, with the vertices (negative 2, negative 1), (0, 1), and (2, negative 1). A concave parabolic surface is drawn inside the triangular area, with the highest point at (0, 0.5). The point (phi 1 cap, phi 2 cap) is plotted on the parabola at (1.05, negative 0.6). Another line is drawn from the peak (0, 1) of the triangle, and intersects the parabola at (phi 1 cap, phi 2 cap). The line associated with rho of 1 is 0.83 and the parabola associated with rho of 2 is 0.61. All data are estimated.

Approach 2: Least squares. Consider the shifted stationary AR(2) model

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} . \end{array}$

For least squares estimation, we first establish the sum of squares S as a function of the parameters μ, [latex]\phi_1[/latex], and [latex]\phi_2[/latex]. This time, however, we forgo the calculus and leave the optimization to the R optim function in order to find the least squares estimators of μ, [latex]\phi_1[/latex], and [latex]\phi_2[/latex]. Once these least squares estimators have been determined, the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex] will be estimated.

The sum of squared errors is

$\begin{array}{l} S = \sum_{t = 3}^{n} Z_{t}^{2} = \sum_{t = 3}^{n} {[X_{t} - μ - ϕ_{1} (X_{t - 1} - μ) - ϕ_{2} (X_{t - 2} - μ)]}^{2} . \end{array}$

If this derivation were being done by hand, we would now calculate the partial derivatives of S with respect to the unknown parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], equate them to zero and solve. As was the case with the AR(1) model, there is no closed-form solution, so numerical methods are required to calculate the parameter estimates. In the example that follows, we will use the optim function in R to determine the least squares parameter estimates that minimize S.

The last parameter to estimate is [latex]\sigma _ Z ^ {\, 2}[/latex]. Since

$\begin{array}{l} γ (0) = \frac{(1 - ϕ_{2}) σ_{Z}^{2}}{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2})} \end{array}$

from Theorem 9.10 for an AR(2) time series model, the population variance of the white noise can be expressed as

$\begin{array}{l} σ_{Z}^{2} = \frac{(1 + ϕ_{2}) (1 + ϕ_{1} - ϕ_{2}) (1 - ϕ_{1} - ϕ_{2}) γ (0)}{(1 - ϕ_{2})} . \end{array}$

Replacing [latex]\phi_1[/latex] and [latex]\phi_2[/latex] by their least squares estimators [latex]\hat \phi_1[/latex] and [latex]\hat \phi_2[/latex], respectively, and replacing the lag 0 autocovariance [latex]{\gamma(0) = V \left[ X_t \right]}[/latex] by its estimator [latex]c_0 = \frac{1}{n} \sum_{t\,=\,1}^n \left( X_t - \bar X \right) ^ 2[/latex] gives the estimator

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = \frac{(1 + {\hat{ϕ}}_{2}) (1 + {\hat{ϕ}}_{1} - {\hat{ϕ}}_{2}) (1 - {\hat{ϕ}}_{1} - {\hat{ϕ}}_{2}) c_{0}}{(1 - {\hat{ϕ}}_{2})} . \end{array}$

This derivation constitutes a proof of the following result.

We now use numerical methods to find the least squares estimates for the unknown parameters in the AR(2) time series model for the Lake Huron time series from Example 9.14.

Example 9.16 Find the least squares estimators of μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] for the AR(2) time series model associated with the [latex]n = 98[/latex] lake level observations from Example 9.14.

The R code that follows contains a function s which calculates the sum of squares, and then uses the R optim function to minimize the sum of squares using the method of moments estimates as initial estimates. The optim function minimizes the objective function by default.

The point estimates for the unknown parameters computed by this code are

$\begin{array}{l} \hat{μ} = 578.89 {\hat{ϕ}}_{1} = 1.0217 {\hat{ϕ}}_{2} = - 0.2 3760 {\hat{σ}}_{Z}^{2} = 0.51680, \end{array}$

which corresponds to a sum of squares [latex]S = 43.58[/latex]. The optimal sum of squares can be extracted with the additional R command s(fit). These least squares point estimates of the unknown parameters in the AR(2) time series model are close to the associated method of moments point estimates. The left-hand graph in Figure 9.21 shows the sum of squares as a function of [latex]\phi_1[/latex] for fixed values of the parameters [latex]\hat{\mu} = 578.89[/latex] and [latex]\hat \phi_2 = -0.23760[/latex]. The sum of squares is minimized at [latex]\hat \phi_1 = 1.0217[/latex]. The right-hand graph in Figure 9.21 shows the sum of squares as a function of [latex]\phi_2[/latex] for fixed values of the parameters [latex]\hat \mu = 578.89[/latex] and [latex]\hat \phi_1 = 1.0217[/latex]. The sum of squares is minimized at [latex]\hat \phi_2 = -0.23760[/latex].

Two graphs depict the sum of squares as a function of phi 1 and phi 2. — Figure 9.21: Sum of squares as a function of [latex]\phi_1[/latex] and [latex]\phi_2[/latex] for an AR(2) model.

Long Description for Figure 9.21

The first graph shows the function of phi 1. The horizontal axis phi 1 ranges from 1.00 to 1.04 in increments of 0.01 units. The vertical axis S ranges from 43.58 to 43.64 in increments of 0.02 units. The convex parabola decreases from (1.00, 43.64), reaches a low point at (1.02, 43.58), and increases to (1.04, 43.64). The second graph shows the function of phi 2. The horizontal axis ranges from negative 0.26 to negative 0.22 and ranges from 0.01 units. The vertical axis S ranges from 43.58 to 43.64 in increments of 0.02 units. The convex parabola decreases from (negative 0.26, 43.64), reaches a low point at (negative 0.24, 43.58), and increases to (negative 0.22, 43.63). All data are estimated.

Approach 3: Maximum likelihood estimation. The procedure for determining the maximum likelihood estimators for the unknown parameters in an AR(2) time series model follows along the same lines as in the AR(1) time series model from the previous section. Once again, to use maximum likelihood estimation, we must assume that the random shocks from the white noise are Gaussian white noise, with associated probability density function

$\begin{array}{l} f_{Z_{t}} (z_{t}) = \frac{1}{\sqrt{2 π σ_{Z}^{2}}} e^{- z_{t}^{2} / (2 σ_{Z}^{2})} - \infty < z_{t} < \infty, \end{array}$

for [latex]t = 1, \, 2, \, \ldots, \, n[/latex]. Determining the likelihood function, which is the joint probability density function of the observed values in the time series [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex], involves finding

$\begin{array}{l} L (μ, ϕ_{1}, ϕ_{2}, σ_{Z}^{2}) = f (x_{1}, x_{2}, \dots, x_{n}), \end{array}$

where the [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] arguments on L and the μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] arguments on f have been dropped for brevity. As before, it is not possible to simply multiply the marginal probability density functions because the values in the AR(2) time series model are correlated. As in the case of an AR(1) model, we use the transformation technique to find the conditional joint probability density function of [latex]X_3, \, X_4, \, \ldots, \, X_n[/latex] conditioned on [latex]X_1 = x_1[/latex] and [latex]X_2 = x_2[/latex], which is denoted by

$\begin{array}{l} f_{X_{3}, X_{4}, \dots, X_{n} | X_{1}, X_{2}} (x_{3}, x_{4}, \dots, x_{n} | X_{1} = x_{1}, X_{2} = x_{2}) \end{array}$

for [latex]\left( x_3, \, x_4, \, \ldots, \, x_n \right) \in {\cal R} ^ {n - 2}[/latex]. This conditional joint probability density function is multiplied by the marginal joint probability density function of X₁ and X₂ (which has the bivariate normal distribution) resulting in a joint probability density function of [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]:

$\begin{array}{l} f_{X_{1}, X_{2}, \dots, X_{n}} (x_{1}, x_{2}, \dots, x_{n}) = f_{X_{3}, X_{4}, \dots, X_{n} | X_{1}, X_{2}} (x_{3}, x_{4}, \dots, x_{n} | X_{1} = x_{1}, X_{2} = x_{2}) f_{X_{1}, X_{2}} (x_{1}, x_{2}) \end{array}$

for [latex]\left( x_1, \, x_2, \, \ldots, \, x_n \right) \in {\cal R} ^ n[/latex]. This function serves as the likelihood function, which should be maximized with respect to the unknown parameters [latex]\mu[/latex], [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex]. One can easily imagine how complicated this expression is, based on the values of [latex]\gamma(0)[/latex] and [latex]\gamma(1)[/latex] from Theorem 9.10. So we forgo the tedious mathematics and leave the calculations to the ar function in R when determining the maximum likelihood estimates for the parameters in fitting the Lake Huron time series to the shifted AR(2) time series model.

Example 9.17 Find the maximum likelihood estimators of [latex]\mu[/latex], [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] for the time series of [latex]n = 98[/latex] observations of the level of Lake Huron from Example 9.14 for a shifted AR(2) time series model

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t}, \end{array}$

where [latex]\phi_1[/latex], [latex]\phi_2[/latex], μ, and [latex]\sigma _ Z ^ {\, 2} > 0[/latex] are real-valued parameters and [latex]\left\{ Z_t \right\}[/latex] is a time series of Gaussian white noise.

The point estimates for the unknown parameters in the shifted AR(2) time series model are computed by the single R command

Unlike the procedure described above, the ar function first subtracts the sample mean [latex]\bar x = 579[/latex] from each observation and then proceeds to fit the remaining parameters to the standard AR(2) time series model. This function call returns the maximum likelihood estimates for the parameters as

$\begin{array}{l} \hat{μ} = 579.00 {\hat{ϕ}}_{1} = 1.0437 {\hat{ϕ}}_{2} = - 0. 2496 {\hat{σ}}_{Z}^{2} = 0.4788 . \end{array}$

These parameter estimates are near the associated method of moments and least squares estimates from the previous two examples. The fitted shifted AR(2) model by maximum likelihood estimation is

$\begin{array}{l} X_{t} - 579.00 = 1.0437 (X_{t - 1} - 579.00) - 0.2496 (X_{t - 2} - 579.00) + Z_{t}, \end{array}$

where [latex]Z_t \sim N(0, \, 0.4788)[/latex].

Table 9.6 summarizes the point estimators for the AR(2) model for the Lake Huron time series calculated by the R commands

Table 9.6: AR(2) point estimators for the[latex]n = 98[/latex] Lake Huron levels via the ar function.
Method	[latex]\hat \mu[/latex]	[latex]\hat \phi_1[/latex]	[latex]\hat \phi_2[/latex]	[latex]\hat \sigma _Z ^ {\, 2}[/latex]
Method of moments (Yule–Walker)	579.0	1.0538	[latex]-0.2668[/latex]	0.5075
Ordinary least squares	579.0	1.0217	[latex]-0.2376[/latex]	0.4540
Maximum likelihood estimation	579.0	1.0437	[latex]-0.2496[/latex]	0.4788

The point estimators associated with the three methods are quite close for this particular time series. The R function ar fits autoregressive models. There are tiny differences between some of the entries in Table 9.6 and those from Examples 9.15 and 9.16 which might be due to slightly different approximations and/or roundoff in the optimization routines.

The focus on estimation thus far has been on point estimation techniques. We also want to report some indication of the precision associated with these point estimators. The sampling distributions of [latex]\hat \mu[/latex], [latex]\hat \phi_1[/latex], [latex]\hat \phi_2[/latex], and [latex]\hat \sigma _ Z ^ {\, 2}[/latex] in the AR(2) model are too complicated to derive analytically. As an illustration of how to construct an approximate confidence interval for [latex]\phi_1[/latex] or [latex]\phi_2[/latex], we use the asymptotic normality of the maximum likelihood estimator of [latex]\phi_1[/latex] and [latex]\phi_2[/latex] in the following result. The asymptotic variance–covariance matrix associated with the parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex] is

$\begin{array}{l} \frac{1}{n} [\begin{array}{cc} 1 - ϕ_{2}^{2} & - ϕ_{1} (1 + ϕ_{2}) \\ - ϕ_{1} (1 + ϕ_{2}) & 1 - ϕ_{2}^{2} \end{array}] . \end{array}$

Using just the diagonal elements of this matrix results in the following asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence interval for [latex]\phi_1[/latex] and [latex]\phi_2[/latex].

These asymptotically exact confidence intervals for [latex]\phi_1[/latex] and [latex]\phi_2[/latex] will now be illustrated for the lake levels from the Lake Huron time series from the previous four examples.

Example 9.18 Find an approximate 95% confidence interval for [latex]\phi_1[/latex] for the AR(2) time series model associated with the [latex]n = 98[/latex] Lake Huron time series values from Example 9.14 and assess its actual coverage.

Recall from Table 9.6 that the maximum likelihood estimators of [latex]\phi_1[/latex] and [latex]\phi_2[/latex] returned by the ar function are [latex]\hat \phi_1 = 1.0437[/latex] and [latex]\hat \phi _ 2 = -0.2496[/latex]. We seek an asymptotically exact two-sided 95% confidence interval for [latex]\phi_1[/latex], which is given by

$\begin{array}{l} 1.0437 - 1.96 \sqrt{\frac{1 - (- 0.2496)^{2}}{98}} < ϕ_{1} < 1.0437 + 1.96 \sqrt{\frac{1 - (- 0.2496)^{2}}{98}} \end{array}$

or

$\begin{array}{l} 0.8519 < ϕ_{1} < 1.2354 . \end{array}$

This confidence interval does not contain [latex]\phi_1 = 0[/latex], which leads us to conclude that [latex]\phi_1[/latex] is a statistically significant parameter in the AR(2) model. A similar procedure could be used to find a confidence interval for [latex]\phi_2[/latex].

To assess the actual coverage of this 95% confidence interval for [latex]\phi_1[/latex] requires a Monte Carlo simulation experiment. The code below uses population parameters that are near the parameter estimates for the Lake Huron time series.

After a call to set.seed(3) to establish the random number stream, five runs of this simulation yield:

$\begin{array}{l} 0.9376 0.9390 0.9366 0.9385 0. 9395. \end{array}$

The conclusion that can be drawn from these simulations is that the actual coverage of the approximate 95% confidence interval is about 93.8%. When this code is executed for larger values of n, the anticipated asymptotic results are achieved, as displayed in Figure 9.22. Keep in mind that these actual coverages are not for an AR(2) model in general, but rather an AR(2) model with these particular parameter settings.

A graph of an asymptotic 95 percent confidence interval for the actual coverage. — Figure 9.22: Asymptotic 95% confidence interval for [latex]\phi_1[/latex] actual coverage.

Long Description for Figure 9.22

The horizontal axis n ranges from 100 to 500 in increments of 100 units. The vertical axis measuring actual coverage ranges from 0.938 to 0.950 in increments of 0.002. A dashed horizontal line is drawn at the actual coverage value of 0.950. The actual coverage values for n equals 100, 200, 300, 400, and 500 are 0.938, 0.944, 0.9465, 0.947, and 0.948, respectively, approaching the dashed horizontal line. All data are estimated.

Model Assessment

Now that techniques for point and interval estimates for the parameters in the AR(2) model have been established, we are interested in assessing the adequacy of the AR(2) time series model. This will involve an analysis of the residuals. Recall from Section 8.2.3 that the residuals are defined by

$\begin{array}{l} [residual] = [observed value] - [predicted value] \end{array}$

or

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t} . \end{array}$

Since [latex]\hat {X} _ {t}[/latex] is the one-step-ahead forecast from the time origin [latex]t - 1[/latex], this is more clearly written as

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t - 1} (1) . \end{array}$

From Theorem 9.13, the shifted AR(2) model is

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} \end{array}$

or

$\begin{array}{l} X_{t} = μ + ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} . \end{array}$

Taking the conditional expected value of both sides of this equation gives

$\begin{array}{l} E [X_{t} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{t - 1} = x_{t - 1}] = μ + ϕ_{1} (x_{t - 1} - μ) + ϕ_{2} (x_{t - 2} - μ) . \end{array}$

Replacing the parameters by their point estimators, the one-step-ahead forecast from the time origin [latex]t - 1[/latex] is

$\begin{array}{l} {\hat{X}}_{t - 1} (1) = \hat{μ} + {\hat{ϕ}}_{1} (x_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (x_{t - 2} - \hat{μ}) . \end{array}$

Therefore, for the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] and the fitted AR(2) model with parameter estimates [latex]\hat \mu[/latex], [latex]\hat \phi_1[/latex], and [latex]\hat \phi_2[/latex], the residual at time t is

$\begin{array}{l} {\hat{Z}}_{t} = x_{t} - [\hat{μ} + {\hat{ϕ}}_{1} (x_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (x_{t - 2} - \hat{μ})] \end{array}$

for [latex]t = 3, \, 4, \, \ldots, \, n[/latex]. The next example shows the steps associated with assessing the adequacy of the AR(2) model for the Lake Huron lake level time series.

Example 9.19 Fit the AR(2) model to the Lake Huron levels from Example 9.14 using the sample mean to estimate μ and the maximum likelihood estimators for [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex].

Calculate and plot the residuals, their sample autocorrelation function, and their sample partial autocorrelation function.
Conduct a test of independence on the residuals using the number of sample autocorrelation function values for the first [latex]m = 40[/latex] lags which fall outside of [latex]\pm 1.96 / \sqrt{n}[/latex].
Conduct the Box–Pierce and Ljung–Box tests for independence of the residuals.
Conduct the turning point test for independence of the residuals.
Plot a histogram and a QQ plot of the standardized residuals in order to assess the normality of the residuals.

The following R commands calculate the [latex]n - 2 = 96[/latex] residuals and plot them as a time series, along with the associated sample autocorrelation function and sample partial autocorrelation function.

The results are displayed in Figure 9.23. The residuals do not appear to have any cyclic variation, trend, or serial correlation.

Figure 9.23: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n - 2 = 96[/latex] residuals from AR(2) fitted model.

Long Description for Figure 9.23

In the times series plot, the horizontal axis t ranges from 1 to 96. The vertical axis Z cap of t ranges from negative 2 to 2 in increments of 1 unit. A horizontal line is drawn at 0. The graph behaves in a spike fashion such that almost half of the residuals are positive and half of the residuals are negative. The first 40 residuals range between negative 1 and 1. The residuals reach a peak of 1.7 at t equals 53, and reach a low point of negative 1.8 at t equals 55. The remaining residuals almost have negative values between negative 2 and 0, and some have positive values between 0 and 1.5. In both correlograms, the horizontal axis k ranges from 0 to 40 in increments of 10 units. The vertical axis for r subscript k and r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Two dashed horizontal lines are drawn at negative 0.2 and 0.2. Both correlograms for r subscript k and r star subscript k are similar. The r subscript k value for k equals 0 is 1.0, and the remaining values alternate signs and range between negative 0.15 and 0.2. The r star subscript k value for k equals 0 is 1.0, and the remaining values alternate from positive to negative, and range between negative 0.15 and 0.2. All data are estimated.
There are no sample autocorrelation function values that fall outside of the limits [latex]\pm 1.96 / \sqrt{n}[/latex] in the plot in Figure 9.23 of the first 40 sample autocorrelation function values associated with the residuals. Since we expect [latex]40 \cdot 0.05 = 2[/latex] values to fall outside of these limits in the case of a good fit, we fail to reject H₀ in this case. The fit of the AR(2) model is not rejected by this test.
The additional R code below calculates the Box–Pierce test statistic and the Ljung–Box test statistic and the associated p-values using the built-in Box.test function.

The Box–Pierce test statistic is 18.7 and the associated p-value is [latex]p = 0.995[/latex]. The Ljung–Box test statistic is 24.9 and the associated p-value is [latex]p = 0.935[/latex]. We fail to reject H₀ in both tests based on the chi-square critical value with [latex]40 - 3 = 37[/latex] degrees of freedom. The fit of the AR(2) model is not rejected by these tests.
The following additional R code calculates the test statistic and the p-value for the turning point test applied to the time series consisting of the [latex]n - 2 = 96[/latex] residual values for the AR(2) fit to the Lake Huron time series.

The tail probability is doubled because the alternative hypothesis is two-tailed for the turning point test. The test statistic s is 0.0815 and the p-value is [latex]p = 0.94[/latex]. The turning point test found that there were [latex]T = 63[/latex] turning points in the time series of the residuals, and that is about the number that we expect to have if the residuals from the fitted AR(2) model were mutually independent random variables. We again fail to reject the null hypothesis in this case. The fit of the AR(2) model is not rejected by this test.
The residuals are standardized by dividing by their sample standard deviation. The following additional R statements plot a histogram of the standardized residuals using the hist function and a QQ plot to assess normality using the qqnorm function.

The plots are shown in Figure 9.24. The histogram shows that all standardized residuals fall between [latex]-3[/latex] and 3 and exhibit a roughly bell-shaped probability distribution. The horizontal axis on the histogram is the standardized residual and the vertical axis is the frequency. The QQ plot is approximately linear, indicating a reasonable approximation to normality based on the [latex]n - 2 = 96[/latex] residuals plotted. The horizontal axis on the QQ plot is the standardized theoretical quantile and the vertical axis is the associated normal data quantile. Although a formal statistical goodness-of-fit test (such as the Shapiro–Wilk or the Kolmogorov–Smirnov test) should be conducted, it appears that the assumption of Gaussian white noise is appropriate for the AR(2) time series model based on these two plots.

A histogram and a Q Q plot show the standardized residuals for the A R 2 model. — Figure 9.24: Histogram (left) and QQ plot (right) of the fitted AR(2) standardized residuals.

Long Description for Figure 9.24

In the histogram, the horizontal axis ranges from negative 3 to 3 in increments of 1 unit. The vertical axis ranges from 0 to 20 in increments of 5 units. The distribution of the histogram is a bell curve. There are 12 bars between negative 3 and 3, and the frequencies are 2, 0, 8, 6, 16, 20, 18, 11, 11, 2, and 3, respectively. In the Q Q plot, the horizontal and the vertical axes range from negative 3 to 3 in increments of 1 unit. Ninety six residuals are plotted in an increasing linear trend throughout the graph. The cluster is formed between negative 1 and 2 on the horizontal axis and between negative 1 and 1 on the vertical axis. The first few and the last few data points are separated from the cluster. All data are estimated.

Model Selection

We have seen a number of indicators that the AR(2) time series model seems to be an adequate model for the Lake Huron lake level time series, with the exception of a linear trend apparent by viewing the time series in Figure 9.18. The model has not been rejected by any of the model adequacy tests. We now overfit the tentative AR(2) time series model with ARMA(p, q) models of higher order. We have not yet surveyed the techniques for estimating the parameters in these models with additional terms, so for now we will let the arima function in R estimate their parameters and compare them via their AIC (Akaike's Information Criterion) statistics. The AIC statistic was introduced in Section 8.2.4

Example 9.20 For the [latex]n = 98[/latex] levels of Lake Huron from Example 9.14, determine the ARMA(p, q) model that minimizes the AIC.

The R code below creates a [latex]4 \times 4[/latex] matrix a which will be populated with the AIC statistics for the ARMA(p, q) time series models, for [latex]p = 0, \, 1, \, 2, \, 3[/latex] and [latex]q = 0, \, 1, \, 2, \, 3[/latex] using nested for loops. The arima function is used to fit the models via maximum likelihood estimation, and the AIC values are placed in the matrix a.

The results of this code are given in Table 9.7. The two smallest AIC values are set in boldface type; they correspond to the AR(2) and ARMA(1, 1) models. These two models seem to be close competitors for providing a probabilistic model for the time series.

Table 9.7: AIC statistics for ARMA(*p, q*) models for the [latex]n = 98[/latex] lake water levels.
	q = 0	q = 1	q = 2	q = 3
p = 0	335	255	231	222
p = 1	219	214	216	218
p = 2	215	216	218	220
p = 3	216	218	220	220

The $ extractor with the aic argument was used to extract the AIC statistics from the list returned by the call to arima. If the coef and sigma2 components are extracted from the list returned by the call to arima, our final model is the AR(2) model with maximum likelihood estimates for the parameters given by

$\begin{array}{l} \hat{μ} = 579.05 {\hat{ϕ}}_{1} = 1.0436 {\hat{ϕ}}_{2} = - 0.2 4949 {\hat{σ}}_{Z}^{2} = 0.47882, \end{array}$

which corresponds to the fitted AR(2) model

$\begin{array}{l} X_{t} - 579.05 = 1.0436 (X_{t - 1} - 579.05) - 0.24949 (X_{t - 2} - 579.05) + Z_{t}, \end{array}$

where Z_t is a time series of Gaussian white noise values with [latex]\sigma _ Z ^ {\, 2} = 0.47822[/latex], as established by the histogram and QQ plot in Example 9.19.

The analysis here suggests that this tentative fitted shifted AR(2) time series model should be compared with (a) a shifted ARMA(1, 1) model because of the lower value for its AIC for an identical number of parameters, and (b) a time series model based on removing the possible downward trend in the time series by using regression or differencing as described in Example 9.14.

Forecasting

We now consider forecasting future values of a time series that is governed by a shifted AR(2) time series model. In the case of the Lake Huron time series, this corresponds to the one-step-ahead forecast for 1973, the two-steps-ahead forecast for 1974, the three-steps-ahead forecast for 1975, etc. To review forecasting notation, the observed time series values are [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. The forecast is being made at time [latex]t = n[/latex]. The random future value of the time series that is h time units in the future is denoted by [latex]X_{n + h}[/latex]. The associated forecasted value is denoted by [latex]\hat{X}_{n + h}[/latex], and is the conditional expected value

$\begin{array}{l} {\hat{X}}_{n + h} = E [X_{n + h} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] . \end{array}$

We would like to find this forecasted value and an associated prediction interval for a shifted AR(2) model. As in Section 8.2.2, we assume that all parameters are known in the derivations that follow. We also assume that the parameters [latex]\phi_1[/latex] and [latex]\phi_2[/latex] correspond to a stationary shifted AR(2) time series model.

The shifted AR(2) model is

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} . \end{array}$

Replacing t by [latex]n + 1[/latex] and solving for [latex]X_{n + 1}[/latex], this becomes

$\begin{array}{l} X_{n + 1} = μ + ϕ_{1} (X_{n} - μ) + ϕ_{2} (X_{n - 1} - μ) + Z_{n + 1} . \end{array}$

Taking the conditional expected value of each side of this equation results in the one-step-ahead forecast

$\begin{array}{l} {\hat{X}}_{n + 1} = μ + ϕ_{1} (x_{n} - μ) + ϕ_{2} (x_{n - 1} - μ) \end{array}$

because [latex]x_{n - 1}[/latex] and x_n have already been observed in the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. The forecasted value at time [latex]n + 1[/latex] is a function of the last two values in the time series. Applying this same process to the predicted value at time [latex]n + 2[/latex] results in the time series model

$\begin{array}{l} X_{n + 2} = μ + ϕ_{1} (X_{n + 1} - μ) + ϕ_{2} (X_{n} - μ) + Z_{n + 2} . \end{array}$

This time, the value of [latex]X_{n + 1}[/latex] has not been observed, so we replace it by its forecasted value when taking the conditional expected value of both sides of the equation

$\begin{array}{l} {\hat{X}}_{n + 2} = μ + ϕ_{1} ({\hat{X}}_{n + 1} - μ) + ϕ_{2} (x_{n} - μ), \end{array}$

because x_n has already been observed. Continuing in this fashion, a recursive formula for the forecasted value of [latex]X_{n + h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} = μ + ϕ_{1} ({\hat{X}}_{n + h - 1} - μ) + ϕ_{2} ({\hat{X}}_{n + h - 2} - μ) . \end{array}$

Although we would prefer an explicit formula, the recursive formula is easy to implement for an observed time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. As in the case of the AR(1) model, long-term forecasts for a stationary AR(2) time series model tend to μ as the time horizon [latex]h \rightarrow \infty[/latex].

We would like to pair our point estimator [latex]\hat{X}_{n + h}[/latex] with an interval estimator, which is a prediction interval in this setting. The prediction interval gives us an indication of the precision of the forecast. In order to derive an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X _ {n + h}[/latex], it is helpful to write the shifted AR(2) model as a shifted MA(∞) model. The coefficients θ₁, θ₂, … of a stationary shifted AR(2) model written as an MA(∞) model

$\begin{array}{l} X_{t} = μ + Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots \end{array}$

are given in terms of [latex]\phi_1[/latex] and [latex]\phi_2[/latex] in Theorem 9.12. Consider this model at time [latex]t = n + 1[/latex]. Since the error terms [latex]Z_n, \, Z_{n - 1}, \, Z_{n - 2}, \, \ldots[/latex] are unknown but fixed because they are associated with the observed time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex], the conditional population variance of [latex]X_{n+1}[/latex] is

$\begin{array}{l} V [X_{n + 1}] = V [Z_{n + 1}] = σ_{Z}^{2} \end{array}$

because the population variance of μ is zero and [latex]Z_{n + 1}[/latex] is the only random term in the model. The error terms at time n and prior are observed even though unknown and can therefore be treated as constants. Likewise, considering the MA(∞) model at time [latex]t = n + 2[/latex], the conditional population variance of [latex]X_{n+2}[/latex] is

$\begin{array}{l} V [X_{n + 2}] = V [Z_{n + 2} + θ_{1} Z_{n + 1}] = (1 + θ_{1}^{2}) σ_{Z}^{2} . \end{array}$

Similarly, the conditional population variance of [latex]X_{n+3}[/latex] is

$\begin{array}{l} V [X_{n + 3}] = V [Z_{n + 3} + θ_{1} Z_{n + 2} + θ_{2} Z_{n + 1}] = (1 + θ_{1}^{2} + θ_{2}^{2}) σ_{Z}^{2} . \end{array}$

Continuing in this fashion, the conditional population variance of [latex]X_{n+h}[/latex] is

$\begin{array}{l} V [X_{n + h}] = (1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}) σ_{Z}^{2} . \end{array}$

If we assume that the white noise terms in the MA(∞) representation of the AR(2) time series model are Gaussian white noise terms, then [latex]X_{n+h}[/latex] is also normally distributed because a linear combination of mutually independent normal random variables is also normally distributed. So an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X_{n+h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} - z_{α / 2} \sqrt{1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}} σ_{Z} < X_{n + h} < {\hat{X}}_{n + h} + z_{α / 2} \sqrt{1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}} σ_{Z} . \end{array}$

In most practical problems, the parameters in this prediction interval will be estimated from data, which results in the following approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval.

Example 9.21 For the time series of Lake Huron levels [latex]x_1, \, x_2, \, \ldots, \, x_{98}[/latex] from Example 9.14, forecast the next five values (for years 1973–1977) in the time series and give approximate 95% prediction intervals for the forecasted values assuming that the time series arises from a shifted AR(2) model with parameters estimated by maximum likelihood.

The R code below uses the ar function to estimate the parameters in the shifted AR(2) model via maximum likelihood estimation. The predict function implements Theorem 9.17 to calculate the forecasted values and associated standard errors. These standard errors can be used to calculate approximate 95% prediction interval limits.

The results are summarized in Table 9.8. Notice that the forecasts trend monotonically toward [latex]\bar x = 579[/latex] and the standard errors increase as the time horizon h increases. The increasing standard error is consistent with having less precision in the forecast as the time horizon h increases.

Table 9.8: Forecasts and 95% prediction intervals for the Lake Huron time series.
Time	t = 99	t = 100	t = 101	t = 102	t = 103
Year	1973	1974	1975	1976	1977
Forecast	579.79	579.59	579.43	579.31	579.23
Standard error	0.692	1.000	1.157	1.233	1.269
Lower prediction bound	578.43	577.63	577.16	576.89	576.74
Upper prediction bound	581.15	581.55	581.70	581.73	581.71

Figure 9.25: Lake Huron forecasts and 95% prediction intervals.

Long Description for Figure 9.25

In the time series plot, the horizontal axis t range from 1 to 108. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 unit. A horizontal line is drawn at x subscript t equals 579. The line graph begins with 580.2 at t equals 1, reaches a maximum of 582 at t equals 2, and progressively decreases to the value 577 at t equals 51 with many fluctuations. It then increases to 580.5 at t equals 53, decreases to 576 at t equals 58, and increases to 581 at t equals 76 with many fluctuations. It then decreases to the lowest value of 576 at t equals 90, and increases to 580 at t equals 98. For t values 99 to 108, the forecasted points are drawn as circles in the decreasing concave curve from 580 to 579. The 95 percent prediction interval from t equals 99 to 108 ranging between x subscript values 577 and 581.5 is shaded. The actual average level values from t equals 99 to 108 decrease from 580.98 to 578.97. All data are estimated.

Figure 9.25 shows (a) the original time series [latex]x_{1}, \, x_{2}, \, \ldots, \, x_{98}[/latex] as points ([latex]\bullet[/latex]) connected by lines, (b) the first 10 forecasted lake levels [latex]\hat{X}_{99}, \, \hat{X}_{100}, \, \ldots , \, \hat{X}_{108}[/latex] as open circles ([latex]\circ[/latex]), (c) the 95% prediction intervals as a shaded region, and (d) the next 10 actual average lake level values in July for the years 1973–1982 taken from the NOAA Great Lakes Experimental Research Laboratory website,

$\begin{array}{l} 580.98, 581.04, 580.49, 580.52, 578.57, 578.96, 579.94, 579.77, 579.44, 578.97, \end{array}$

as points ([latex]\bullet[/latex]) connected by lines. There are four key observations concerning Figure 9.25.

Even though the last three observations in the Lake Huron water level time series, [latex]{x_{96} = 579.31}[/latex], [latex]{x_{97} = 579.89}[/latex], and [latex]{x_{98} = 579.96}[/latex], show an increasing trend, the forecasts, which are a function only of [latex]x_{n - 1} = x_{97}[/latex] and [latex]x_n = x_{98}[/latex], monotonically approach [latex]\hat \mu = \bar x = 579[/latex]. The reason that the forecasts approac h [latex]\hat \mu = \bar x = 579[/latex] in a damped exponential fashion is that the maximum likelihood estimators [latex]\hat \phi_1[/latex] and [latex]\hat \phi_2[/latex] satisfy [latex]\hat \phi_1 ^ 2 + 4 \hat \phi_2 > 0[/latex], which indicates that the characteristic equation has two real roots which fall outside of the unit circle in the complex plane (see the proof of Theorem 9.9). Had the two roots been complex conjugates, the forecasted values would likewise approach [latex]\hat \mu = \bar x = 579[/latex], but in a damped sinusoidal fashion.
The widths of the prediction intervals increase as the time horizon h increases. These widths do not increase indefinitely, but rather approach a limit as [latex]h \rightarrow \infty[/latex].
The random sampling variability which is evident in the observed time series values [latex]x_1, \, x_2, \, \ldots, \, x_{98}[/latex] is not apparent in the forecasted values [latex]\hat{X}_{99}, \, \hat{X}_{100}, \, \ldots , \, \hat{X}_{108}[/latex]. Observed time series values tend to exhibit the typical random sampling variability; forecasted values for a stationary shifted AR(2) time series model tend to be smooth.
The first actual value in the forecast region, [latex]x_{99} = 580.98[/latex] for the year 1973, nearly falls outside of the associated 95% prediction interval. Even if the AR(2) model is a good fit for this time series, there is still a probability of approximately 0.05 that a future observation will fall outside of the associated 95% prediction interval. One value out of ten falling outside of the prediction intervals would not be shocking to see, assuming that a reasonable time series model has been formulated.

This section has introduced the AR(2) time series model. The important results for an AR(2) model are listed below.

The standard AR(2) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} and ϕ (B) X_{t} = Z_{t}, \end{array}$

where [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B ^ 2[/latex] is the characteristic polynomial and [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex] (Definition 9.2).
The shifted AR(2) model can be written algebraically and with the backshift operator B as (Theorem 9.13)
$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + Z_{t} and ϕ (B) (X_{t} - μ) = Z_{t} . \end{array}$
The AR(2) model is always invertible; the AR(2) model is stationary when [latex]\phi_1[/latex] and [latex]\phi_2[/latex] fall in a triangular-shaped region in the [latex](\phi_1, \, \phi_2)[/latex] plane defined by the constraints [latex]\phi_1 + \phi_2 < 1[/latex], [latex]\phi_2 - \phi_1 < 1[/latex], and [latex]\phi_2 > -1[/latex] (Theorem 9.9).
The AR(2) population autocorrelation function is a mixture of damped exponential functions, when [latex]\phi(B)[/latex] has real roots, or a damped sinusoidal function, when [latex]\phi(B)[/latex] has complex roots (Theorem 9.10).
The AR(2) population partial autocorrelation function cuts off after lag 2 (Theorem 9.11), making its shape easier to recognize than the population autocorrelation function for the statistical counterparts associated with a realization of a time series.
The stationary shifted AR(2) model can be written as a shifted MA(∞) model (Theorem 9.12).
The four parameters in the shifted AR(2) model, μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex] and [latex]\sigma _ Z ^ {\, 2}[/latex], can be estimated from a realization of a time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] by the method of moments (Theorem 9.14), least squares (Theorem 9.15), and maximum likelihood using at least [latex]n = 60[/latex] or [latex]n = 70[/latex] observations. The point estimators for μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] are denoted by [latex]\hat \mu[/latex], [latex]\hat \phi_1[/latex], [latex]\hat \phi_2[/latex], and [latex]\hat \sigma _ Z ^ {\, 2}[/latex], and are typically paired with asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence intervals (Theorem 9.16).
The forecasted value [latex]\hat{X} _ {n + h}[/latex] in an AR(2) model is a function of [latex]x_{n - 1}[/latex] and x_n and can be calculated by a recursive formula. It approaches [latex]\hat \mu = \bar x[/latex] as the time horizon [latex]h \rightarrow \infty[/latex]. The associated prediction intervals have widths that increase as h increases and approach a limit as the time horizon [latex]h \rightarrow \infty[/latex] (Theorem 9.17).

The AR(1) time series model expresses the current value in the time series X_t as a constant times the previous value in the time series plus a random shock. The AR(2) time series model expresses the current value in the time series X_t as a linear combination of the previous two values in the time series plus a random shock. There is conceptually no difficulty extending this thinking to the AR(p) time series model in which the current value in the time series X_t is expressed as a linear combination of the previous p values in the time series plus a random shock. The AR(p) time series model is the subject of the next section.

9.1.3 The AR(p) Model

The order p autoregressive model, denoted by AR(p), is a straightforward generalization of the AR(2) model. The use of matrices in the derivations will be novel, along with the inability to easily visualize the stationary region as a function of the parameters. The AR(p) model is appropriate in instances in which the current value of the time series is a linear combination of the p previous values in the time series plus a random shock.

The [latex]p + 1[/latex] parameters that define an AR(p) model are the real-valued coefficients [latex]\phi_1, \, \phi_2, \, \ldots , \phi_p[/latex], and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. The final coefficient, [latex]\phi_p[/latex], must be nonzero. The AR(p) model can be written more compactly in terms of the backshift operator B as

$\begin{array}{l} ϕ (B) X_{t} = Z_{t}, \end{array}$

where [latex]\phi(B)[/latex] is the order p characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p} . \end{array}$

The AR(p) model has the form of a multiple linear regression model with p independent variables and no intercept term. The current value X_t is being modeled as a linear combination of the p previous values of the time series, [latex]X_{t-1}, \, X_{t - 2}, \, \ldots , \, X_{t - p}[/latex], plus a white noise term Z_t that provides a random shock to the model. The parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] control the inclination of the regression line ([latex]p =1[/latex]), plane ([latex]p = 2[/latex]), or hyperplane ([latex]p > 2[/latex]). The [latex]\sigma _ Z ^ {\, 2}[/latex] parameter reflects the magnitude of the dispersion of the time series values about the regression plane.

Stationarity

Theorem 8.3 indicates that all AR(p) models are invertible, but are stationary when all of the roots of [latex]\phi(B)[/latex] lie outside of the unit circle in the complex plane. Let [latex]B_1, \, B_2, \, \ldots , \, B_p[/latex] denote the p solutions of [latex]\phi(B) = 0[/latex]. For a stationary model, all of these roots will be real-valued or complex conjugate pairs that lie outside of the unit circle in the complex plane. Since [latex]\phi(B_1) = \phi(B_2) = \cdots = \phi(B_p) = 0[/latex], the order p characteristic polynomial [latex]\phi(B)[/latex] can also be written in factored form as

$\begin{array}{l} ϕ (B) = (1 - B_{1}^{- 1} B) (1 - B_{2}^{- 1} B) \dots (1 - B_{p}^{- 1} B) . \end{array}$

Unfortunately, except for the cases of [latex]p = 1[/latex] and [latex]p = 2[/latex], the region in the space of [latex]( \phi_1 , \, \phi_2 , \, \ldots , \, \phi_p )[/latex] corresponding to a stationary model cannot be expressed in a simple mathematical form. The following example illustrates how to determine whether an AR(4) model is stationary. This AR(4) model will be used in the next five examples.

A unit circle, with the origin as its center, is drawn on the complex plane. The horizontal axis represents real and the vertical axis represents imaginary. Four points are plotted outside the circle. A point on the positive real axis is labeled B 1, a point on the negative real axis is labeled B 2, the point in quadrant 1 is labeled B 3, and the point in quadrant 4 is labeled B 4. — Figure 9.26: Unit circle in the complex plane and the solutions of [latex]\phi(B) = 0[/latex].

Duality

As was the case with the AR(1) and AR(2) time series models, a stationary AR(p) time series model can be written as an MA(∞) time series model. This alternative representation can be useful for estimating standard errors of forecasted values. One way to frame the problem of writing an AR(p) time series model as an MA(∞) time series model is to write the compact form of the AR(p) model as

$\begin{array}{l} ϕ (B) X_{t} = Z_{t} \end{array}$

and divide both sides by [latex]\phi(B)[/latex], which results in

$\begin{array}{l} X_{t} = \frac{Z_{t}}{ϕ (B)} . \end{array}$

Therefore, the conversion from the AR(p) form of the model to the MA(∞) form involves finding the coefficients [latex]\theta_1, \, \theta_2, \, \ldots[/latex] such that

$\begin{array}{l} X_{t} = \frac{Z_{t}}{ϕ (B)} = (1 + θ_{1} B + θ_{2} B^{2} + \dots) Z_{t} . \end{array}$

The coefficients [latex]\theta_1, \, \theta_2, \, \ldots[/latex] essentially correspond to finding the inverse of the [latex]\phi(B)[/latex] characteristic polynomial. Taking the expected value of both sides of this equation leads to the important result: [latex]E \left[ X_t \right] = 0[/latex] for all values of t. As was the case of the AR(2) time series model, the coefficients for the MA(∞) time series model are found by equating coefficients. This process will be illustrated in the next example for the AR(4) model. Generalization to the AR(p) model is straightforward.

Example 9.23 Calculate the first six coefficients of the MA(∞) model associated with the stationary AR(4) model from Example 9.22 with characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - \frac{21}{20} B - \frac{1}{20} B^{2} + \frac{23}{40} B^{3} - \frac{3}{10} B^{4} . \end{array}$

The MA(∞) model has the form

$\begin{array}{l} X_{t} = Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + θ_{3} Z_{t - 3} + θ_{4} Z_{t - 4} + \dots . \end{array}$

The AR(4) model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + ϕ_{3} X_{t - 3} + ϕ_{4} X_{t - 4} + Z_{t} \end{array}$

can be written in terms of θ₁, θ₂, … as

$\begin{array}{l} Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + θ_{3} Z_{t - 3} + & θ_{4} Z_{t - 4} + \dots = \\ ϕ_{1} (Z_{t - 1} + θ_{1} Z_{t - 2} + θ_{2} Z_{t - 3} + θ_{3} Z_{t - 4} + \dots) + \\ ϕ_{2} (Z_{t - 2} + θ_{1} Z_{t - 3} + θ_{2} Z_{t - 4} + \dots) + \\ ϕ_{3} (Z_{t - 3} + θ_{1} Z_{t - 4} + \dots) + \\ ϕ_{4} (Z_{t - 4} + \dots) + Z_{t} . \end{array}$

Equating the coefficients of [latex]Z_{t - 1}[/latex] gives

$\begin{array}{l} θ_{1} = ϕ_{1} . \end{array}$

Equating the coefficients of [latex]Z_{t - 2}[/latex] gives

$\begin{array}{l} θ_{2} = ϕ_{1} θ_{1} + ϕ_{2} = ϕ_{1}^{2} + ϕ_{2} . \end{array}$

Equating the coefficients of [latex]Z_{t - 3}[/latex] and simplifying gives

$\begin{array}{l} θ_{3} = ϕ_{1}^{3} + 2 ϕ_{1} ϕ_{2} + ϕ_{3} . \end{array}$

Equating the coefficients of [latex]Z_{t - 4}[/latex] and simplifying gives

$\begin{array}{l} θ_{4} = ϕ_{1}^{4} + 3 ϕ_{1}^{2} ϕ_{2} + ϕ_{2}^{2} + 2 ϕ_{1} ϕ_{3} + ϕ_{4} . \end{array}$

Equating the coefficients of [latex]Z_{t - k}[/latex] gives the recursive equation

$\begin{array}{l} θ_{k} = ϕ_{1} θ_{k - 1} + ϕ_{2} θ_{k - 2} + ϕ_{3} θ_{k - 3} + ϕ_{4} θ_{k - 4} \end{array}$

for [latex]k = 5, \, 6, \, \ldots[/latex]. The coefficients of the AR(4) model of interest are

$\begin{array}{l} ϕ_{1} = \frac{21}{20} ϕ_{2} = \frac{1}{20} ϕ_{3} = - \frac{23}{40} ϕ_{4} = \frac{3}{10} . \end{array}$

Using the equations derived here, the first six coefficients of the associated MA(∞) model as exact fractions are

$\begin{array}{l} θ_{1} = \frac{21}{20}, θ_{2} = \frac{461}{400}, θ_{3} = \frac{5501}{8000}, θ_{4} = \frac{76141}{160000}, θ_{5} = \frac{596381}{3200000}, θ_{6} = \frac{10870221}{64000000} . \end{array}$

The ARMAtoMA function in R can also compute these coefficients as follows.

This R command returns the decimal approximations of the exact fractions:

$\begin{array}{l} θ_{1} = 1.05, θ_{2} = 1.1525, θ_{3} = 0.6876, θ_{4} = 0.4759, θ_{5} = 0.1864, θ_{6} = 0.1698 . \end{array}$

Population Autocorrelation Function

We now pivot to the derivation of the population autocovariance and autocorrelation functions. Assuming that the parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] are associated with a stationary model, the AR(p) model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + \dots + ϕ_{p} X_{t - p} + Z_{t} \end{array}$

can be multiplied by [latex]X_{t - k}[/latex] to give

$\begin{array}{l} X_{t} X_{t - k} = ϕ_{1} X_{t - 1} X_{t - k} + ϕ_{2} X_{t - 2} X_{t - k} + \dots + ϕ_{p} X_{t - p} X_{t - k} + Z_{t} X_{t - k} . \end{array}$

Taking the expected value of both sides of this equation for [latex]k = 0[/latex] results in

$\begin{array}{l} γ (0) = ϕ_{1} γ (1) + ϕ_{2} γ (2) + \dots + ϕ_{p} γ (p) + σ_{Z}^{2} \end{array}$

and the recursive equation

$\begin{array}{l} γ (k) = ϕ_{1} γ (k - 1) + ϕ_{2} γ (k - 2) + \dots + ϕ_{p} γ (k - p) \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex] because Z_t has expected value zero and is independent of [latex]X_{t - k}[/latex]. For [latex]k = 1, \, 2, \, \ldots , \, p[/latex], the recursive equation can be written as the system of linear equations

$\begin{array}{l} γ (1) & = ϕ_{1} γ (0) + ϕ_{2} γ (1) + ϕ_{3} γ (2) + \dots + ϕ_{p} γ (p - 1) \\ γ (2) & = ϕ_{1} γ (1) + ϕ_{2} γ (0) + ϕ_{3} γ (1) + \dots + ϕ_{p} γ (p - 2) \\ γ (3) & = ϕ_{1} γ (2) + ϕ_{2} γ (1) + ϕ_{3} γ (0) + \dots + ϕ_{p} γ (p - 3) \\ ⋮ & = ⋮ \\ γ (p) & = ϕ_{1} γ (p - 1) + ϕ_{2} γ (p - 2) + ϕ_{3} γ (p - 3) + \dots + ϕ_{p} γ (0), \end{array}$

which relies on the symmetry of the population autocovariance function: [latex]\gamma(-k) = \gamma(k)[/latex]. This linear system of p equations in [latex]p + 1[/latex] unknowns can be written in matrix form as

$\begin{array}{l} γ = Γ ϕ, \end{array}$

where

$\begin{array}{l} γ = [\begin{array}{c} γ (1) \\ γ (2) \\ γ (3) \\ ⋮ \\ γ (p) \end{array}], Γ = [\begin{array}{ccccc} γ (0) & γ (1) & γ (2) & \dots & γ (p - 1) \\ γ (1) & γ (0) & γ (1) & \dots & γ (p - 2) \\ γ (2) & γ (1) & γ (0) & \dots & γ (p - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ γ (p - 1) & γ (p - 2) & γ (p - 3) & \dots & γ (0) \end{array}], ϕ = [\begin{array}{c} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ ⋮ \\ ϕ_{p} \end{array}] . \end{array}$

Given the values of the parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex], this set of linear equations and

$\begin{array}{l} γ (0) = ϕ_{1} γ (1) + ϕ_{2} γ (2) + \dots + ϕ_{p} γ (p) + σ_{Z}^{2}, \end{array}$

one can compute the first [latex]p + 1[/latex] population autocovariances [latex]\gamma(0), \, \gamma(1), \, \ldots , \, \gamma(p)[/latex] by solving these linear equations. The recursion relationship can be used to compute subsequent autocovariances.

Example 9.24 Calculate the initial values of the population autocovariance function [latex]\gamma(0), \, \gamma(1), \, \ldots , \, \gamma(6)[/latex] associated with the stationary AR(4) model with characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - \frac{21}{20} B - \frac{1}{20} B^{2} + \frac{23}{40} B^{3} - \frac{3}{10} B^{4} \end{array}$

and white noise variance [latex]\sigma _ Z ^ {\, 2} = 1[/latex].

The coefficients of the AR(4) model of interest are

$\begin{array}{l} ϕ_{1} = \frac{21}{20} ϕ_{2} = \frac{1}{20} ϕ_{3} = - \frac{23}{40} ϕ_{4} = \frac{3}{10} . \end{array}$

To find the initial population autocovariances, solve the [latex]5 \times 5[/latex] set of linear equations

$\begin{array}{l} γ (0) & = ϕ_{1} γ (1) + ϕ_{2} γ (2) + ϕ_{3} γ (3) + ϕ_{4} γ (4) + σ_{Z}^{2} \\ γ (1) & = ϕ_{1} γ (0) + ϕ_{2} γ (1) + ϕ_{3} γ (2) + ϕ_{4} γ (3) \\ γ (2) & = ϕ_{1} γ (1) + ϕ_{2} γ (0) + ϕ_{3} γ (1) + ϕ_{4} γ (2) \\ γ (3) & = ϕ_{1} γ (2) + ϕ_{2} γ (1) + ϕ_{3} γ (0) + ϕ_{4} γ (1) \\ γ (4) & = ϕ_{1} γ (3) + ϕ_{2} γ (2) + ϕ_{3} γ (1) + ϕ_{4} γ (0) \end{array}$

for [latex]\gamma(0), \, \gamma(1), \, \gamma(2), \, \gamma(3), \, \gamma(4)[/latex]. The recursive equation

$\begin{array}{l} γ (k) = ϕ_{1} γ (k - 1) + ϕ_{2} γ (k - 2) + ϕ_{3} γ (k - 3) + ϕ_{4} γ (k - 4) \end{array}$

can be used to calculate [latex]\gamma(k)[/latex] values for [latex]k = 5, \, 6, \, \ldots[/latex]. The initial population autocovariance values are

$\begin{array}{l} γ (0) = \frac{3520}{819} ≅ 4.298, γ (1) = \frac{2960}{819} ≅ 3.614, γ (2) = \frac{2260}{819} ≅ 2.759, γ (3) = \frac{1385}{819} ≅ 1.69 1, \end{array}$

$\begin{array}{l} γ (4) = \frac{3685}{3276} ≅ 1.125, γ (5) = \frac{10001}{13104} ≅ 0. 763, γ (6) = \frac{186881}{262080} ≅ 0.713 . \end{array}$

These population autocovariances can be used to calculate the associated population autocorrelations by dividing each of them by [latex]\gamma(0)[/latex].

Dividing both sides of the recursive equation for calculating population autocovariance by [latex]\gamma(0) = V \left[ X_t \right][/latex] gives the recursive equation

$\begin{array}{l} ρ (k) = ϕ_{1} ρ (k - 1) + ϕ_{2} ρ (k - 2) + \dots + ϕ_{p} ρ (k - p) \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex]. Exploiting the symmetry of the [latex]\rho(k)[/latex] function, the first p of these equations are

$\begin{array}{l} ρ (1) & = ϕ_{1} ρ (0) + ϕ_{2} ρ (1) + ϕ_{3} ρ (2) + \dots + ϕ_{p} ρ (p - 1) \\ ρ (2) & = ϕ_{1} ρ (1) + ϕ_{2} ρ (0) + ϕ_{3} ρ (1) + \dots + ϕ_{p} ρ (p - 2) \\ ρ (3) & = ϕ_{1} ρ (2) + ϕ_{2} ρ (1) + ϕ_{3} ρ (0) + \dots + ϕ_{p} ρ (p - 3) \\ ⋮ & = ⋮ \\ ρ (p) & = ϕ_{1} ρ (p - 1) + ϕ_{2} ρ (p - 2) + ϕ_{3} ρ (p - 3) + \dots + ϕ_{p} ρ (0) . \end{array}$

Since [latex]\rho(0) = 1[/latex], this linear system of p equations in the p unknowns [latex]\rho(1), \, \rho(2), \, \ldots , \, \rho(p)[/latex] can be written in matrix form as

$\begin{array}{l} ρ = P ϕ, \end{array}$

where

$\begin{array}{l} ρ = [\begin{array}{c} ρ (1) \\ ρ (2) \\ ρ (3) \\ ⋮ \\ ρ (p) \end{array}], P = [\begin{array}{ccccc} 1 & ρ (1) & ρ (2) & \dots & ρ (p - 1) \\ ρ (1) & 1 & ρ (1) & \dots & ρ (p - 2) \\ ρ (2) & ρ (1) & 1 & \dots & ρ (p - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ρ (p - 1) & ρ (p - 2) & ρ (p - 3) & \dots & 1 \end{array}], ϕ = [\begin{array}{c} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ ⋮ \\ ϕ_{p} \end{array}] . \end{array}$

Given the values of the parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], these linear equations can be solved for the initial p population autocorrelation function values [latex]\rho(1), \, \rho(2), \, \ldots , \, \rho(p)[/latex], and the recursive function can be used to calculate subsequent values of the population autocorrelation values.

As was the case with the AR(2) time series model, (a) the real roots of [latex]\phi(B)[/latex] correspond to contributions to the population autocorrelation function which are mixtures of damped exponential terms, and (b) the complex conjugate roots of [latex]\phi(B)[/latex] correspond to contributions to the population autocorrelation function which are damped sinusoidal terms.

These equations bear some practical use in that the first psample autocorrelation function values, [latex]r_1, \, r_2, \, \ldots , \, r_p[/latex], can be calculated from an observed time series and used as approximations for [latex]\rho(1), \, \rho(2), \, \ldots , \, \rho(p)[/latex], yielding estimators for [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex]. These estimates are known as the Yule–Walker estimators. These can in turn be used as initial estimates for finding point estimates for [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] by, for example, least squares or maximum likelihood estimation, should numerical methods be required.

The results concerning the calculation of the population autocovariance function [latex]\gamma(k)[/latex] and the population autocorrelation function [latex]\rho(k)[/latex] are summarized below.

The system of linear equations in Theorem 9.18, whether written in terms of [latex]\gamma(k)[/latex] or [latex]\rho(k)[/latex] as [latex]\gamma = \Gamma \phi[/latex] or [latex]\rho = P \phi[/latex], is known in time series analysis as the Yule–Walker equations.

Population Partial Autocorrelation Function

We now determine the population partial autocorrelation function for an AR(p) model. Using Definition 7.4, the initial population partial autocorrelation values are

$\begin{array}{l} ρ^{*} (0) = 1, ρ^{*} (1) = ρ (1), ρ^{*} (2) = \frac{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & ρ (2) \end{array} |}{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & 1 \end{array} |}, ρ^{*} (3) = \frac{| \begin{array}{ccc} 1 & ρ (1) & ρ (1) \\ ρ (1) & 1 & ρ (2) \\ ρ (2) & ρ (1) & ρ (3) \end{array} |}{| \begin{array}{ccc} 1 & ρ (1) & ρ (2) \\ ρ (1) & 1 & ρ (1) \\ ρ (2) & ρ (1) & 1 \end{array} |}, \end{array}$

etc. One distinctive characteristic of the AR(p) population partial autocorrelation function is that it cuts off after lag p. To see why this is the case, consider the first p columns of the matrix in the numerator of [latex]\rho ^ * (k)[/latex] for [latex]k > p[/latex]:

$\begin{array}{l} [\begin{array}{c} 1 \\ ρ (1) \\ ρ (2) \\ ⋮ \\ ρ (k - 1) \end{array}], [\begin{array}{c} ρ (1) \\ 1 \\ ρ (1) \\ ⋮ \\ ρ (k - 2) \end{array}], \dots, [\begin{array}{c} ρ (p - 1) \\ ρ (p - 2) \\ ρ (p - 3) \\ ⋮ \\ ρ (k - p) \end{array}] . \end{array}$

Using Theorem 9.18, the last column of the matrix in the numerator of [latex]\rho ^ * (k)[/latex] is

$\begin{array}{l} [\begin{array}{c} ϕ_{1} + ϕ_{2} ρ (1) + ϕ_{3} ρ (2) + \dots + ϕ_{p} ρ (p - 1) \\ ϕ_{1} ρ (1) + ϕ_{2} + ϕ_{3} ρ (1) + \dots + ϕ_{p} ρ (p - 2) \\ ϕ_{1} ρ (2) + ϕ_{2} ρ (1) + ϕ_{3} + \dots + ϕ_{p} ρ (p - 3) \\ ⋮ \\ ϕ_{1} ρ (k - 1) + ϕ_{2} ρ (k - 2) + ϕ_{3} ρ (k - 3) + \dots + ϕ_{p} ρ (k - p) \end{array}] . \end{array}$

The last column of the matrix in the numerator of [latex]\rho ^ * (k)[/latex] is a linear combination of the first p columns with coefficients [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex]. Thus, the matrix in the numerator of the calculation of [latex]\rho ^ * (k)[/latex] is singular, which means that its determinant is zero. So [latex]\rho ^ * (k) = 0[/latex] for [latex]k = p + 1, \, p + 2, \, \ldots[/latex] for an AR(p) time series model. This constitutes a proof of the following result.

A graph of the sample partial autocorrelation function [latex]r^*_k[/latex] for the first few values of k, should also cut off after lag p if the AR(p) model is appropriate. This sample partial autocorrelation function shape is easier to recognize than the associated sample autocorrelation function shape because cutting off is typically easier to recognize than tailing off in the presence of random sampling variability.

There is a second interpretation of the partial autocorrelation function that ties it more closely to determining the order of the autoregressive portion of the model. The partial autocorrelation at lag k is the value of the final coefficient [latex]\phi_k[/latex] in an autoregressive model of order k. This coefficient measures the excess correlation at lag k which is not accounted for by an autoregressive model of order [latex]k - 1[/latex]. It is for this reason that many authors use the notation [latex]\phi_{kk}[/latex] for the population lag k partial autocorrelation.

The population autocorrelation function and the population partial autocorrelation functions can be calculated using the formulas given here, but can also be calculated using the R ARMAacf function, as illustrated in the next example.

Example 9.25 Calculate and plot the values of the population autocorrelation function and the population partial autocorrelation function associated with the AR(4) model with characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - \frac{21}{20} B - \frac{1}{20} B^{2} + \frac{23}{40} B^{3} - \frac{3}{10} B^{4} . \end{array}$

The coefficients of the AR(4) model of interest are

$\begin{array}{l} ϕ_{1} = \frac{21}{20} ϕ_{2} = \frac{1}{20} ϕ_{3} = - \frac{23}{40} ϕ_{4} = \frac{3}{10} . \end{array}$

The matrix equation [latex]\rho = P \phi[/latex] from Theorem 9.18 could be solved for the initial values of [latex]\rho(k)[/latex]. Alternatively, the values of [latex]\gamma(k)[/latex] calculated in the previous example could be divided by [latex]\gamma(0)[/latex] to arrive at the population autocorrelation function values. The first two such autocorrelation function values, for example, are

$\begin{array}{l} ρ (1) = \frac{γ (1)}{γ (0)} = \frac{2960 / 819}{3520 / 819} = \frac{2960}{3520} ≅ 0.8409, \end{array}$

$\begin{array}{l} ρ (2) = \frac{γ (2)}{γ (0)} = \frac{2260 / 819}{3520 / 819} = \frac{2260}{3520} ≅ 0.6420, \end{array}$

The R function ARMAacf can also be used to calculate the population autocorrelation function values for the first 15 lags, as illustrated below. The ar argument is a vector containing the coefficients [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\phi_3[/latex], and [latex]\phi_4[/latex], and the ma argument is set to zero because there are no moving average terms.

The population partial autocorrelation function values can be computed by taking the ratios of the determinants from Definition 7.4. Alternatively, the pacf argument to the ARMAacf function can be set to TRUE to compute the values of [latex]\rho^*(k)[/latex] for the first 15 lags.

Table 9.9 contains the numeric values of the first seven values of [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex]. Figure 9.27 contains a plot of [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] for the first 15 lags. The population autocorrelation function includes the effects of mixtures of damped exponential terms (associated with the two real roots [latex]B_1 = 5/4[/latex] and [latex]B_2 = -4/3[/latex] of [latex]\phi(B) = 0[/latex] computed in Example 9.22) and damped sinusoidal terms (associated with the two complex roots [latex]B_3 = 1 + i[/latex] and [latex]B_4 = 1 - i[/latex] of [latex]\phi(B) = 0[/latex] computed in Example 9.22). As expected, the population partial autocorrelation function cuts off after lag 4.

Table 9.9: The first seven values of [latex]\rho(k)[/latex] and [latex]\rho^*(k)[/latex] for an AR(4) time series model.
k	1	2	3	4	5	6	7
[latex]\rho(k)[/latex]	0.8409	0.6420	0.3935	0.2617	0.1776	0.1659	0.1506
[latex]\rho^*(k)[/latex]	0.8409	−0.2222	−0.2857	0.3000	0	0	0

Two correlograms plot the population autocorrelation and partial autocorrelation functions for the A R 4 time series. — Figure 9.27: The first 15 values of [latex]\rho(k)[/latex]and [latex]\rho^*(k)[/latex] for an AR(4) time series model.

Long Description for Figure 9.27

In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis rho of k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at rho subscript k value 0.0. The values of rho of k decrease exponentially from 1.0 to 0.02. In the second correlogram, the horizontal axis k ranges from 0 to 15 and the vertical axis rho star k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The function rho star of k behaves like a damped sinusoidal pattern and the values are 1.0, 0.8, negative 0.2, negative 0.25, and 0.25 for k values 0, 1, 2, 3, and 4. All data are estimated.

The Shifted AR(p) Model

The standard AR(p) model from Definition 9.3 is not of much practical use because most real-world time series are not centered around zero. Adding a shift parameter μ overcomes this shortcoming. Since population variance and covariance are unaffected by a shift, the associated population autocorrelation and partial autocorrelation functions remain the same as those given in Theorems 9.18 and 9.19.

Theorem 9.20 A shifted order p autoregressive model for the time series [latex]\left\{ X_t \right\}[/latex] is defined by

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t}, \end{array}$

where [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], μ, and [latex]\sigma _ Z ^ {\, 2} > 0[/latex] are real-valued parameters, and [latex]\left\{ Z_t \right\}[/latex] is a time series of white noise. This model is stationary when all of the roots of the characteristic equation [latex]\phi(B) = 0[/latex] fall outside of the unit circle in the complex plane. The expected value of X_t is [latex]E \left[ X_t \right] = \mu[/latex]. The population autocorrelation function can be calculated using the recursive equations in Theorem 9.18. The population partial autocorrelation function can be calculated using the defining formulas in Definition 7.4.

The shifted AR(p) model can be written in terms of the backshift operator B as

$\begin{array}{l} ϕ (B) (X_{t} - μ) = Z_{t}, \end{array}$

where [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B ^ p[/latex]. The practical problem of fitting a shifted AR(p) model to an observed time series of n values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] will be illustrated later in this subsection.

Simulation

An AR(p) time series can be simulated by appealing to the defining formula for the AR(p) model. Iteratively applying the defining formula for a standard AR(p) model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + \dots + ϕ_{p} X_{t - p} + Z_{t} \end{array}$

from Definition 9.3 results in the simulated values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]. The difficult aspect of devising a simulation algorithm is generating the first p simulated values, [latex]X_1, \, X_2, \, \ldots, \, X_p[/latex]. For simplicity, assume that the white noise terms are Gaussian white noise terms. There are two approaches to overcome this initialization problem. The first approach generates [latex]X_1, \, X_2, \, \ldots , \, X_p[/latex] from a multivariate normal distribution with population mean p-vector [latex]\boldsymbol{0} = (0, \, 0, \, \ldots , \, 0) ^ \prime[/latex] and [latex]p \times p[/latex] variance–covariance matrix

$\begin{array}{l} Γ = [\begin{array}{ccccc} γ (0) & γ (1) & γ (2) & \dots & γ (p - 1) \\ γ (1) & γ (0) & γ (1) & \dots & γ (p - 2) \\ γ (2) & γ (1) & γ (0) & \dots & γ (p - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ γ (p - 1) & γ (p - 2) & γ (p - 3) & \dots & γ (0) \end{array}], \end{array}$

which was defined in Theorem 9.18. The algorithm given below generates initial time series observations [latex]X_1, \, X_2, \, \ldots , \, X_p[/latex] as indicated above, and then uses an additional [latex]n - p[/latex] Gaussian white noise terms [latex]Z_{p + 1}, \, Z_{p + 2}, \, \ldots, \, Z_n[/latex] to generate the remaining time series values [latex]X_{p + 1}, \, X_{p + 2}, \, \ldots, \, X_n[/latex] using the AR(p) defining formula from Definition 9.3. Indentation denotes nesting in the algorithm.

The [latex](p + 2)[/latex]-parameter shifted AR(p) time series model which includes a population mean parameter μ can be simulated by simply adding μ to each time series observation generated by this algorithm. The next example implements this algorithm in R.

Example 9.26 Generate a realization of [latex]n = 100[/latex] observations from the stationary AR(4) time series model with

$\begin{array}{l} ϕ_{1} = \frac{21}{20} ϕ_{2} = \frac{1}{20} ϕ_{3} = - \frac{23}{40} ϕ_{4} = \frac{3}{10} \end{array}$

and Gaussian white noise error terms with [latex]\sigma _Z ^ {\, 2} = 1[/latex].

This model is stationary (see Example 9.22). The population autocorrelation function [latex]\rho(k)[/latex] and the population partial autocorrelation function [latex]\rho ^ *(k)[/latex] are displayed in Figure 9.27; we expect similar shaped functions r_k and [latex]r^*_k[/latex] from our simulated values. The first statement in the R code below uses the set.seed function to establish the random number seed. The second statement sets [latex]p = 4[/latex], corresponding to an AR(4) model. The third statement sets the vector phi to the AR(4) coefficients [latex]\phi_1 = 21/20[/latex], [latex]\phi_2 = 1/20[/latex], [latex]\phi_3 = -23/40[/latex], and [latex]\phi_4 = 3/10[/latex]. The fourth statement places the initial population autocovariance values from Example 9.24, namely [latex]\gamma(0) = 3520 / 819[/latex], [latex]\gamma(1) = 2960 / 819[/latex], [latex]\gamma(2) = 2260 / 819[/latex], and [latex]\gamma(3) = 1385 / 819[/latex], into the vector gam. The subsequent nested for loops place these population autocovariance values in the [latex]4 \times 4[/latex] variance–covariance matrix GAMMA. The next statement sets the standard deviation of the Gaussian white noise to [latex]\sigma _ Z = 1[/latex]. The next statement sets the number of simulated values to [latex]n = 100[/latex]. The next statement defines the vector x of length [latex]n = 100[/latex] to hold the simulated time series values. The next statement uses the mvrnorm function from the MASS package to generate the first four simulated time series observations [latex]X_1, \, X_2, \, X_3, \, X_4[/latex] from the appropriate multivariate normal distribution. Finally, the for loop iterates through the defining formula for the AR(4) model generating the remaining observations [latex]X_5, \, X_6, \, \ldots, \, X_{100}[/latex].

Using the plot.ts function to make a plot of the time series contained in x, the acf function to plot the associated correlogram, the pacf function to plot the associated sample partial autocorrelation function, and the layout function to arrange the graphs as in Example 7.24, the resulting trio of graphs are displayed in Figure 9.28. The sample partial autocorrelation function has four statistically significant spikes at lags 1, 2, 3, and 4 which is consistent with an AR(4) model. The spikes cut off after lag 4 as expected from the population counterparts in Figure 9.27. The approximate 95% confidence intervals indicated by the dashed lines show that the values of the sample partial autocorrelation function do not significantly differ from zero at lags beyond lag 4. The sample autocorrelation function displays a mixture of damped exponential terms damped sinusoidal terms as expected, with statistically significant autocorrelations at the first two lags: [latex]r ^ * _ 1 = 0.7351[/latex] and [latex]r ^ * _ 2 = 0.4417[/latex]. The time series plot shows that observations tend to linger on one side of the population mean (indicated by a horizontal line at [latex]\mu = 0[/latex]), which is consistent with the two initial statistically significant positive spikes in the sample autocorrelation function.

A time series plot for 100 simulated values for the A R 4 model, with the associated correlograms. — Figure 9.28: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an AR(4) model.

Long Description for Figure 9.28

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from negative 4 to 4 in increments of 2 units. A horizontal line is drawn at 0. The first 17 values are positive and range between 0 and 2, the next 3 values are negative and range between negative 1 and negative 3. The values then increase to 3 at t equals 27 and decrease to the lowest point negative 4 at t equals negative 53 with many fluctuations. The values then increase to a peak of 4 at t equals 64, and then have a pattern of successive, decreasing peaks for the remaining t values. In the first correlogram, the horizontal axis k ranges from 0 to 15 and the vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The first 6 r subscript k values decrease progressively from 1.0 to 0. For k values 6 to 10, the r subscript k values follow a bell shape, reaching a peak of 0.25, and the last 5 values decrease from negative 0.01 to negative 0.2. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The r star subscript k values alternate signs, approximately following a damped sinusoidal pattern with decreasing magnitude. All data are estimated.

We recommend running the simulation code from the previous example several dozen times in a loop and viewing the associated plots of x_t, r_k, and [latex]r_k^*[/latex] in search of patterns. This will allow you to see how various realizations of this simulated AR(4) time series model vary from one realization to the next. So when you then view a single realization of a real-life time series, you will have a sense of how far these plots might deviate from their expected patterns.

There is a second way to overcome the initialization problem in simulating observations from an AR(p) time series. This second technique starts the time series with p initial arbitrary values, and then allows the time series to “warm up” or “burn in” for several time periods before producing the first observation X₁. Reasonable p initial arbitrary values for the standard AR(p) model are 0; reasonable p initial arbitrary values for the shifted AR(p) model are μ. This approach can be implemented in R with the filter function with "recursive" as the method argument. The code below generates [latex]n = 100[/latex] values in the AR(4) time series model from the previous example using a warm-up period of 50 observations.

This is also the approach taken by the built-in R function named arima.sim, which simulates a realization of a time series. Using the arima.sim function means that [latex]n = 100[/latex] observations from the AR(4) time series model from the previous example can be simulated using a single command, using a warm-up period of 50 observations.

The remaining topics associated with the AR(p) time series model are statistical in nature: parameter estimation, model assessment, model selection, and forecasting. We begin with parameter estimation.

Parameter Estimation

The [latex]p + 2[/latex] parameters to estimate in a shifted AR(p) time series model are [latex]\phi_1, \, \phi_2 , \, \ldots , \, \phi_p, \, \mu , \, \sigma _ Z ^ {\, 2}[/latex]. There are three techniques for estimating these parameters considered here: method of moments, least squares, and maximum likelihood estimation. These techniques were introduced in Section 8.2.1. These three techniques are outlined in the following paragraphs.

Approach 1: Method of moments. In the case of estimating the [latex]p + 2[/latex] parameters in the shifted AR(p) time series model by the method of moments, we match the population and sample first-order moments, second-order moments, lag 1 autocorrelation, lag 2 autocorrelation, [latex]\ldots[/latex], lag p autocorrelation. Placing the population moments on the left-hand side of the equation and the associated sample moments on the right-hand side of the equation results in [latex](p + 2)[/latex] equations in [latex](p + 2)[/latex] unknowns:

$\begin{array}{l} E [X_{t}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t} \\ E [X_{t}^{2}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ ρ (1) & = r_{1} \\ ρ (2) & = r_{2} \\ ⋮ & = ⋮ \\ ρ (p) & = r_{p} . \end{array}$

Since [latex]E \left[ X_t \right] = \mu[/latex] for a stationary shifted AR(p) time series model, the first equation gives the method of moments estimator [latex]\hat \mu = \bar X[/latex]. Recall from Theorem 9.18 that the relationship between [latex]\phi_1, \, \phi_2 , \, \ldots , \, \phi_p[/latex] and [latex]\rho(1), \, \rho(2) , \, \ldots , \, \rho(p)[/latex] is given by the matrix equation

$\begin{array}{l} ρ = P ϕ, \end{array}$

where

$\begin{array}{l} ρ = [\begin{array}{c} ρ (1) \\ ρ (2) \\ ρ (3) \\ ⋮ \\ ρ (p) \end{array}], P = [\begin{array}{ccccc} 1 & ρ (1) & ρ (2) & \dots & ρ (p - 1) \\ ρ (1) & 1 & ρ (1) & \dots & ρ (p - 2) \\ ρ (2) & ρ (1) & 1 & \dots & ρ (p - 3) \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ ρ (p - 1) & ρ (p - 2) & ρ (p - 3) & \dots & 1 \end{array}], ϕ = [\begin{array}{c} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ ⋮ \\ ϕ_{p} \end{array}] . \end{array}$

Satisfying the method of moments criteria, the lag k population autocorrelation [latex]\rho(k)[/latex] can be replaced with its statistical analog r_k, for [latex]k = 1, \, 2, \, \ldots, \, p[/latex]. The resulting matrix equation is

$\begin{array}{l} r = R ϕ, \end{array}$

where

$\begin{array}{l} r = [\begin{array}{c} r_{1} \\ r_{2} \\ r_{3} \\ ⋮ \\ r_{p} \end{array}], R = [\begin{array}{ccccc} 1 & r_{1} & r_{2} & \dots & r_{p - 1} \\ r_{1} & 1 & r_{1} & \dots & r_{p - 2} \\ r_{2} & r_{1} & 1 & \dots & r_{p - 3} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ r_{p - 1} & r_{p - 2} & r_{p - 3} & \dots & 1 \end{array}], ϕ = [\begin{array}{c} ϕ_{1} \\ ϕ_{2} \\ ϕ_{3} \\ ⋮ \\ ϕ_{p} \end{array}] . \end{array}$

This matrix equation can be solved for the method of moments estimators as

$\begin{array}{l} \hat{ϕ} = R^{- 1} r . \end{array}$

These are known as the Yule–Walker estimators because of their relationship to the Yule–Walker equations. Finally, the remaining parameter to estimate is the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. From Theorem 9.18,

$\begin{array}{l} σ_{Z}^{2} = γ (0) - ϕ_{1} γ (1) - ϕ_{2} γ (2) - \dots - ϕ_{p} γ (p) . \end{array}$

Multiplying and dividing the right-hand side of this equation by [latex]\gamma(0)[/latex] gives

$\begin{array}{l} σ_{Z}^{2} = γ (0) [1 - ϕ_{1} ρ (1) - ϕ_{2} ρ (2) - \dots - ϕ_{p} ρ (p)] . \end{array}$

Replacing these elements by their method of moments estimators gives

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = c_{0} [1 - {\hat{ϕ}}_{1} r_{1} - {\hat{ϕ}}_{2} r_{2} - \dots - {\hat{ϕ}}_{p} r_{p}], \end{array}$

which can be expressed in matrix form as

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = c_{0} (1 - r^{'} \hat{ϕ}) . \end{array}$

Since the formula for these estimators does not require any iterative methods, the method of moments estimators are often used as initial parameter estimates for the least squares estimators and the maximum likelihood estimators, which do require iterative methods. These point estimators for the parameters in a shifted AR(p) model are summarized below.

Example 9.27 We now revisit the modeling of the built-in R time series LakeHuron from Example 9.14 consisting of [latex]n = 98[/latex] monthly mean levels (in feet) of the lake level of Lake Huron from 1875–1972. An AR(2) time series model was fit to this time series using the method of moments in Example 9.15. The fitted AR(2) model was deemed to be a reasonable fit via the goodness-of-fit tests in Example 9.19. Calculate the method of moments parameter estimates for the overfitted AR(3) model.

Since estimating the parameters involves just a matrix inverse and a matrix multiplication, these estimators are easily computed in an R function. The user-written YuleWalker function given below has the time series observations in the vector x and the order of the AR(p) time series model p as arguments. It uses the built-in acf function to compute [latex]r_1, \, r_2, \, \ldots , \, r_p[/latex] and the solve function to compute the inverse of the R matrix. The R code below calculates and prints the point estimates of the parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\phi_3[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] parameters for the AR(3) time series model using the method of moments estimators given in Theorem 9.21.

The method of moments point estimates for the unknown parameters computed by this code are

$\begin{array}{l} \hat{μ} = 579.00, {\hat{ϕ}}_{1} = 1.0887, {\hat{ϕ}}_{2} = - 0.40454, {\hat{ϕ}}_{3} = 0.13075, {\hat{σ}}_{Z}^{2} = 0.48358 . \end{array}$

Alternatively, some keystrokes can be saved by using the built-in ar function to estimate the parameters in the AR(3) time series model, as shown below. The results are identical except the estimate of the population variance of the white noise differs slightly because of differing assumptions made within the ar function.

Approach 2: Least squares. Consider the shifted stationary AR(p) model

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t} . \end{array}$

For least squares estimation, we first establish the sum of squares S as a function of the parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex]. We leave the optimization to the R ar function in order to calculate the least squares estimators of μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex]. Once these least squares estimators have been determined, the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex] will be estimated.

The sum of squared errors is

$\begin{array}{l} S = \sum_{t = p + 1}^{n} Z_{t}^{2} = \sum_{t = p + 1}^{n} {[X_{t} - μ - ϕ_{1} (X_{t - 1} - μ) - ϕ_{2} (X_{t - 2} - μ) - \dots - ϕ_{p} (X_{t - p} - μ)]}^{2} . \end{array}$

If this derivation were being done by hand, we would now calculate the partial derivatives of S with respect to the unknown parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex], equate them to zero and solve. As was the case with the AR(1) and AR(2) models, there is no closed-form solution, so numerical methods are required to calculate the parameter estimates. In the example that follows, we will use the ar function in R to determine the least squares parameter estimates that minimize S.

The last parameter to estimate is the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. The same estimator as the method of moments will be used:

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = c_{0} (1 - r^{'} \hat{ϕ}) . \end{array}$

Least squares estimation for a shifted AR(p) time series model is summarized below.

We now use numerical methods to find the least squares estimates for the unknown parameters in the AR(p) time series model for the Lake Huron time series from Example 9.14.

Example 9.28 Find the least squares estimates of μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] from the AR(p) time series model for the time series of [latex]n = 98[/latex] Lake Huron lake level observations from Example 9.14, for [latex]p = 1, \, 2, \, 3, \, 4[/latex]. Plot the sum of squares associated with the least squares estimates as a function of p.

The R code below uses a for loop to iterate over the various values of p. It uses the ar function with the method argument set to "ols" (for ordinary least squares) to calculate the least squares estimates of the unknown parameters. It uses a nested for loop to calculate the sum of squares at the values of the point estimates.

The point estimates for the unknown parameters and the sums of squares at the point estimates that are computed by this code are given in Table 9.10. Notice that the least squares point estimators for the AR(3) model are close to the method of moments point estimators for the AR(3) model calculated in Example 9.27. The graph in Figure 9.29 shows the sum of squares as a function of the order of the autoregressive model p. The sum of squares shows a “law of diminishing returns” as p increases. There is a large decrease in the sum of squares on the transition from [latex]p = 1[/latex] term to [latex]p = 2[/latex] terms. Beyond [latex]p = 2[/latex], however, the decreases are substantially smaller. This pattern is consistent with the [latex]q = 0[/latex] column from Table 9.7 in Example 9.20, which indicated that the AIC statistic was minimized for [latex]p = 2[/latex], which corresponds to a shifted AR(2) time series model.

Table 9.10: Least squares parameter estimates and sums of squares for AR(p) models.
p	[latex]\hat \mu[/latex]	[latex]\hat \phi_1[/latex]	[latex]\hat \phi_2[/latex]	[latex]\hat \phi_3[/latex]	[latex]\hat \phi_4[/latex]	[latex]\hat \sigma _ Z ^ {\, 2}[/latex]	S
1	579.00	0.8364				0.5090	49.38
2	579.00	1.0217	−0.2376			0.4540	43.64
3	579.00	1.0719	−0.3653	0.1088		0.4488	42.66
4	579.00	1.0738	−0.3739	0.0569	0.0625	0.4475	42.12

A line graph depicts the sum of squares as a function of the order of the autoregressive model p. The horizontal axis p ranges from 1 to 4 in increments of 1 unit. The vertical axis S ranges from 42 to 50 in increments of 2 units. The decreasing line connects the points (1, 49.5), (2, 43.5), (3, 43), and (4, 42.5). All data are estimated. — Figure 9.29: Sum of squares as a function of p for AR(p) models.

Approach 3: Maximum likelihood estimation. The procedure for determining the maximum likelihood estimators for the unknown parameters in a shifted AR(p) time series model follows along the same lines as in the AR(1) and AR(2) time series models from the previous subsections. Once again, to use maximum likelihood estimation, we must assume that the random shocks from the white noise are Gaussian white noise, with associated probability density function

$\begin{array}{l} f_{Z_{t}} (z_{t}) = \frac{1}{\sqrt{2 π σ_{Z}^{2}}} e^{- z_{t}^{2} / (2 σ_{Z}^{2})} - \infty < z_{t} < \infty, \end{array}$

for [latex]t = 1, \, 2, \, \ldots, \, n[/latex]. Determining the likelihood function, which is the joint probability density function of the observed values in the time series [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex], involves finding

$\begin{array}{l} L (μ, ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}, σ_{Z}^{2}) = f (x_{1}, x_{2}, \dots, x_{n}), \end{array}$

where the [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] arguments on L and the μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex] arguments on f have been dropped for brevity and [latex]n > p[/latex]. As before, it is not possible to simply multiply the marginal probability density functions because the values in the AR(p) time series model are correlated. As in the case of the AR(1) and AR(2) models, we use the transformation technique to find the conditional joint probability density function of [latex]X_{p + 1}, \, X_{p + 2}, \, \ldots, \, X_n[/latex] conditioned on [latex]X_1 = x_1[/latex], [latex]X_2 = x_2[/latex], [latex]\ldots[/latex], [latex]X_p = x_p[/latex], which is denoted by

$\begin{array}{l} f_{X_{p + 1}, X_{p + 2}, \dots, X_{n} | X_{1}, X_{2}, \dots, X_{p}} (x_{p + 1}, x_{p + 2}, \dots, x_{n} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{p} = x_{p}) \end{array}$

for [latex]\left( x_{p + 1}, \, x_{p + 2}, \, \ldots, \, x_n \right) \in {\cal R} ^ {n - p}[/latex]. This conditional joint probability density function is multiplied by the marginal joint probability density function of X₁, X₂, [latex]\ldots[/latex], X_p (which has the p-dimensional multivariate normal distribution) resulting in a joint probability density function of [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex]:

$\begin{array}{l} f_{X_{1}, X_{2}, \dots, X_{n}} & (x_{1}, x_{2}, \dots, x_{n}) = \\ f_{X_{p + 1}, X_{p + 2}, \dots, X_{n} | X_{1}, X_{2}, \dots, X_{p}} (x_{p + 1}, x_{p + 2}, \dots, x_{n} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{p} = x_{p}) \times \\ f_{X_{1}, X_{2}, \dots, X_{p}} (x_{1}, x_{2}, \dots, x_{p}) \end{array}$

for [latex]\left( x_1, \, x_2, \, \ldots, \, x_n \right) \in {\cal R} ^ n[/latex]. This function serves as the likelihood function, which should be maximized with respect to the unknown parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\ldots[/latex], [latex]\phi_p[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex]. We leave the maximization to the ar and arima functions in R when determining the maximum likelihood estimates for the parameters for a particular time series to be fitted to the shifted AR(p) time series model.

In addition to point estimators for the parameters, we are also interested in confidence intervals that capture the precision of the point estimators. The population variance of the vector of parameter estimators [latex]\hat \phi = ( \hat \phi_1 , \, \hat \phi_2 , \, \ldots , \, \hat \phi_p ) ^ \prime[/latex] is given by the variance–covariance matrix

$\begin{array}{l} V [\hat{ϕ}] = \frac{1}{n} (1 - ρ^{'} ϕ) P^{- 1} . \end{array}$

Since the maximum likelihood estimators for [latex]\phi_1, \, \phi_2 , \, \ldots , \, \phi_p[/latex] are asymptotically unbiased and normally distributed under certain regularity conditions,

$\begin{array}{l} \hat{ϕ} \overset{D}{\to} N (ϕ, \frac{1}{n} (1 - ρ^{'} ϕ) P^{- 1}) . \end{array}$

For [latex]p = 1[/latex], this reduces to

$\begin{array}{l} {\hat{ϕ}}_{1} \overset{D}{\to} N (ϕ_{1}, \frac{1 - ϕ_{1}^{2}}{n}) . \end{array}$

For [latex]p = 2[/latex], this reduces to

$\begin{array}{l} [\begin{array}{c} {\hat{ϕ}}_{1} \\ {\hat{ϕ}}_{2} \end{array}] \overset{D}{\to} N ([\begin{array}{c} ϕ_{1} \\ ϕ_{2} \end{array}], \frac{1}{n} [\begin{array}{cc} 1 - ϕ_{2}^{2} & - ϕ_{1} (1 + ϕ_{2}) \\ - ϕ_{1} (1 + ϕ_{2}) & 1 - ϕ_{2}^{2} \end{array}]) . \end{array}$

These asymptotic results for [latex]p = 1[/latex] and [latex]p = 2[/latex] were used in the confidence intervals given in Theorems 9.7 and 9.16. When the quantities in this expression are replaced by their statistical counterparts, the estimated variance–covariance matrix of the vector [latex]\hat \phi[/latex] is

$\begin{array}{l} \hat{V} [\hat{ϕ}] = \frac{1}{n} (1 - r^{'} \hat{ϕ}) R^{- 1} . \end{array}$

Using the diagonal elements of this matrix and the asymptotic normality of maximum likelihood estimators, an asymptotically exact [latex]{100(1 - \alpha)}\%[/latex] confidence interval for [latex]\phi_i[/latex] is easily constructed.

The maximum likelihood estimates and associated confidence intervals will be illustrated for an economic time series in the next example.

Example 9.29 Table 9.11 contains the annual lynx (Lynx canadensis) pelt sales, read row-wise, at the Hudson's Bay Company in Canada from 1857 to 1911. Suggest a time series model for the annual pelt sales.

Table 9.11: Annual lynx pelt sales at the Hudson's Bay Company, 1857–1911.
23362	31642	33757	23226	15178	7272	4448	4926	5437	16498
35971	76556	68392	37447	45686	7942	5123	7106	11250	18774
30508	42834	27345	17834	15386	9443	7599	8061	27187	51511
74050	78773	33899	18886	11520	8352	8660	12902	20331	36853
56407	39437	26761	15185	4473	5781	9117	19267	36116	58850
61478	36300	9704	3410	3774

The time series is plotted in Figure 9.30. The annual sales figures vary widely, from a minimum of 3410 pelts sold in 1910 to a maximum of 78,773 pelts sold in 1888. A horizontal line is drawn at the average sales over this time horizon at 25,600 pelts. The time series appears to have a periodic component that seems to cycle about every ten years or so, although nothing about the sales of lynx pelts would seem to account for an approximately decade-long periodicity. There are local maximums in the time series associated with the years 1859, 1868, 1878, 1888, 1897, and 1907. Is consumer behavior driving this periodicity? Are the prices of the pelts driving this periodic behavior? Is the availability of the pelts driving this periodic behavior?

Figure 9.30: Time series plot for [latex]n = 55[/latex] annual lynx pelt sales (1857–1911).

Long Description for Figure 9.30

The horizontal axis t lists the years from 1857 to 1911. The vertical axis y subscript t ranges from 0 to 80000 in increments of 20000 units. A horizontal line is drawn at 25,600 pelts. The data forms a spike pattern. The sales in 1857 is 23,362, increases to 33,757 in 1859, and decreases to 4448 in 1863. It then increases to 76,000 in 1868, decreases to 5123 in 1873, again increases to 42000 in 1878, and again decreases to 8000 in 1883. It then reaches a peak of 78,000 in 1887, decreases to 8300 in 1890, again increases to 56000 in 1897, decreases to 4000 in 1901, increases to 61,000 in 1907, and decreases to 3400 in 1910. All data are estimated.

Some further analysis of the time series indicates that ecology can answer some of the questions. Lynxes depend on the snowshoe rabbit (Lepus americanus) for food, and lynxes starve when the rabbits near extinction periodically. This is an example of one time series depending on another time series. We ignore the dependence on the snoeshoe rabbit in our analysis because multivariate time series analysis is a topic for a more advanced time series course. We consider an AR(p) model here and consider a time series model with a periodic component subsequently.

Based on Figure 9.30, is a stationary time series model appropriate? There does not appear to be any trend in the time series, but the population variance does not appear to be stable. The first local maximum (in 1859) and the third local maximum (in 1878) are not as pronounced as the others. One remedy to this nonconstant population variance is to transform the time series. Taking the logarithm of the time series values, [latex]x_t = \ln y_t[/latex], reduces the impact of the nonconstant variance. Figure 9.31 contains a time series plot of the logarithm of the sales figures, along with the associated sample autocorrelation and partial autocorrelation functions. For simplicity, we define time [latex]t = 1[/latex] to be the year 1857 and [latex]t = 55[/latex] to be the year 1911. The transformation has proven to be effective. As expected, the first and third peaks are still the smallest local maximums of the group, but they are less pronounced than those in the raw data. The original time series values are denoted by y_t, and the transformed time series values are denoted by [latex]x_t = \ln \, y_t[/latex]. The sample autocorrelation function appears to have a damped sinusoidal shape, and the sample partial autocorrelation function cuts off after lag 4, although the spike at lag 4 is only marginally significant. The fact that the sample partial autocorrelation function cuts off leads us to consider an autoregressive time series model. Based on these graphs, we will attempt to fit tentative AR(3) and AR(4) models to the transformed time series [latex]x_1, \, x_2, \, \ldots, \, x_{55}[/latex]. The R code to produce the graphs in Figure 9.31 is given below.

Figure 9.31: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 55[/latex] log annual lynx sales (1857–1911).

Long Description for Figure 9.31

In the time series plot, the horizontal axis t ranges from 1 to 55. The vertical axis x subscript t ranges from 8 to 12 in increments of 1 unit. A horizontal line is drawn at the x subscript t value 9.8. The sales figures follow the spike pattern, increasing from 10 to 12 from t equals 1 to 3, then decreasing to 8.3 at t equals 7, again increasing to 11.2 at t equals 12, and decreasing to 8.5 at t equals 17. The sales are higher for t values 22, 32, 41, and 51, and lower for t values 27, 36, 45, and 54. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The values of r subscript k alternate signs, following a damped sinusoidal pattern. In the second correlogram, the horizontal axis k ranges from 0 to 15 and the vertical axis r star subscript k ranges from negative 1.0 to 1.0. The horizontal line is drawn at 0.0. The values of r star subscript k also slightly follow a damped sinusoidal pattern with decreasing magnitude. All data are estimated.

We use the R arima function to estimate the parameters of the AR(p) time series models for the transformed time series using maximum likelihood estimation and to compute their associated standard errors. The additional R code below fits the AR(3) model to the transformed time series. Setting the method argument to "ML" indicates that the arima function should use maximum likelihood estimation. Various aspects of the fitted model are extracted using the $ extractor.

The resulting fitted AR(3) model is

$\begin{array}{l} X_{t} - \hat{μ} = {\hat{ϕ}}_{1} (X_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (X_{t - 2} - \hat{μ}) + {\hat{ϕ}}_{3} (X_{t - 3} - \hat{μ}) + Z_{t} \end{array}$

or

$\begin{array}{l} \begin{array}{lllll} X_{t} - 9.809 & = & 0.957 (X_{t - 1} - 9.809) - & 0.126 (X_{t - 2} - 9.809) - & 0.470 (X_{t - 3} - 9.809) + Z_{t}, \\ (0.074) & (0.117) & (0.176) & (0.120) \end{array} \end{array}$

where Z_t is white noise with estimated population variance [latex]\hat \sigma _ Z ^ {\, 2} = 0.119[/latex]. The numbers in parentheses just below the parameter estimates are the estimated standard errors of the associated parameter estimates. The associated approximate 95% confidence intervals are

$\begin{array}{l} 0.728 & < ϕ_{1} < 1.186, \\ - 0.471 & < ϕ_{2} < 0.219, \\ - 0.705 & < ϕ_{3} < - 0.235, \\ 9.663 & < μ < 9.955 . \end{array}$

The fact that the confidence interval for [latex]\phi_2[/latex] contains zero should not deter us from considering the AR(3) model because the confidence interval for [latex]\phi_3[/latex] has bounds which do not include zero.

When this same procedure is applied to the fitting of an AR(4) model, the fitted model

$\begin{array}{l} X_{t} - \hat{μ} = {\hat{ϕ}}_{1} (X_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (X_{t - 2} - \hat{μ}) + {\hat{ϕ}}_{3} (X_{t - 3} - \hat{μ}) + {\hat{ϕ}}_{4} (X_{t - 4} - \hat{μ}) + Z_{t} \end{array}$

is

$\begin{array}{l} \begin{array}{llllll} X_{t} - 9.807 & = & 0.774 (X_{t - 1} - 9.807) - & 0.151 (X_{t - 2} - 9.807) - & 0.120 (X_{t - 3} - 9.807) - & 0.378 (X_{t - 4} - 9.807) + Z_{t}, \\ (0.051) & (0.125) & (0.165) & (0.163) & (0.127) \end{array} \end{array}$

where Z_t is white noise with estimated population variance [latex]\hat \sigma _ Z ^ {\, 2} = 0.102[/latex]. The associated approximate 95% confidence intervals are

$\begin{array}{l} 0.529 & < ϕ_{1} < 1.018, \\ - 0.474 & < ϕ_{2} < 0.172, \\ - 0.439 & < ϕ_{3} < 0.200, \\ - 0.626 & < ϕ_{4} < - 0.130, \\ 9.708 & < μ < 9.907 . \end{array}$

Again, the confidence intervals for [latex]\phi_2[/latex] and [latex]\phi_3[/latex] containing zero should not deter us from considering the AR(4) model because the confidence interval for [latex]\phi_4[/latex] has bounds which do not include zero.

So should the AR(3) or AR(4) model be considered the preferred stationary model? A check of the solutions of [latex]\hat \phi(B) = 0[/latex] indicates that all of the solutions lie outside of the unit circle in the complex plane for both the AR(3) and AR(4) models. Another way to select between the two models is to calculate the AIC statistic for these models. The additional R statement

calculates the AIC statistic associated with the fitted AR(p) models, for [latex]p = 0, \, 1, \, \ldots, \, 5[/latex]. The results are shown in Table 9.12.

Table 9.12: AIC values for AR(p) models for the transformed annual lynx pelt sales.
[latex]p = 0[/latex]	[latex]p = 1[/latex]	[latex]p = 2[/latex]	[latex]p = 3[/latex]	[latex]p = 4[/latex]	[latex]p = 5[/latex]
145.3	101.5	63.4	52.3	46.2	48.2

The AIC statistic is minimized for [latex]p = 4[/latex], indicating that the AR(4) model is selected over the AR(3) model. So to summarize, the tentative AR(p) time series model based on (a) the time series plot, (b) the sample autocorrelation function, (c) the sample partial autocorrelation function, and (d) the AIC statistic, is the fitted AR(4) model

$\begin{array}{l} \ln Y_{t} - 9.807 = 0.774 (\ln Y_{t - 1} - 9.807) - 0.151 (\ln Y_{t - 2} - 9.807) - \end{array}$

$\begin{array}{l} 0.120 (\ln Y_{t - 3} - 9.807) - 0.378 (\ln Y_{t - 4} - 9.807) + Z_{t}, \end{array}$

where Y_t corresponds to the original time series consisting of annual lynx pelt sales at the Hudson's Bay Company, and Z_t is white noise with estimated variance [latex]\hat \sigma _ Z ^ {\, 2} = 0.102[/latex].

Model Assessment

Now that techniques for point and interval estimates for the parameters in the AR(p) model have been established, we are interested in assessing the adequacy of the fitted AR(p) time series model. This will involve an analysis of the residuals. Recall from Section 8.2.3 that the residuals are defined by

$\begin{array}{l} [residual] = [observed] - [predicted] \end{array}$

or

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t} . \end{array}$

Since [latex]\hat{X} _ {t}[/latex] is the one-step-ahead forecast from the time origin [latex]t - 1[/latex], this is more clearly written as

$\begin{array}{l} {\hat{Z}}_{t} = X_{t} - {\hat{X}}_{t - 1} (1) . \end{array}$

From Theorem 9.20, the shifted AR(p) model is

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t} \end{array}$

or

$\begin{array}{l} X_{t} = μ + ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t} . \end{array}$

Taking the conditional expected value of both sides of this equation gives

$\begin{array}{l} E [X_{t} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{t - 1} = x_{t - 1}] = μ + ϕ_{1} (x_{t - 1} - μ) + ϕ_{2} (x_{t - 2} - μ) + \dots + ϕ_{p} (x_{t - p} - μ) . \end{array}$

Replacing the parameters by their point estimators, the one-step-ahead forecast from the time origin [latex]t - 1[/latex] is

$\begin{array}{l} {\hat{X}}_{t - 1} (1) = \hat{μ} + {\hat{ϕ}}_{1} (x_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (x_{t - 2} - \hat{μ}) + \dots + {\hat{ϕ}}_{p} (x_{t - p} - \hat{μ}) . \end{array}$

Therefore, for the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] and the fitted AR(p) model with parameter estimates [latex]\hat \mu[/latex], [latex]\hat \phi_1[/latex], [latex]\hat \phi_2[/latex], [latex]\ldots[/latex], [latex]\hat \phi_p[/latex], the residual at time t is

$\begin{array}{l} {\hat{Z}}_{t} = x_{t} - [\hat{μ} + {\hat{ϕ}}_{1} (x_{t - 1} - \hat{μ}) + {\hat{ϕ}}_{2} (x_{t - 2} - \hat{μ}) + \dots + {\hat{ϕ}}_{p} (x_{t - p} - \hat{μ})] \end{array}$

for [latex]t = p + 1, \, p + 2, \, \ldots, \, n[/latex]. The next example shows the steps associated with assessing the adequacy of the AR(4) model for the time series of annual lynx pelt sales.

Example 9.30 Fit the AR(4) time series model to the transformed annual lynx sales from Example 9.29 via maximum likelihood estimation.

Calculate and plot the residuals, their sample autocorrelation function, and their sample partial autocorrelation function.
Conduct a test of independence on the residuals using the number of sample autocorrelation function values for the first [latex]m = 40[/latex] lags which fall outside of [latex]\pm 1.96 / \sqrt{n}[/latex].
Conduct the Box–Pierce and Ljung–Box tests for independence of the residuals.
Conduct the turning point test for independence of the residuals.
Plot a histogram and a QQ plot of the standardized residuals in order to assess the normality of the residuals.

The following R commands calculate the [latex]n - 4 = 51[/latex] residuals of the transformed time series and plot them as a time series, along with the associated sample autocorrelation function and sample partial autocorrelation function.

The results are displayed in Figure 9.32. The residuals do not appear to have any cyclic variation, trend, or serial correlation.

Figure 9.32: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n - 4 = 51[/latex] residuals from AR(4) fitted model.

Long Description for Figure 9.32

In the time series plot, the horizontal axis t ranges from 1 to 51. The vertical axis Z cap t ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The residual increases progressively from negative 0.1 to a peak of 1.0 as t increases from 1 to 11. It then decreases to the lowest value of negative 0.75 at t equals 12, and then fluctuates between the values negative 0.25 and 0.2 until t equals 24. From t values 25 to 37, the residuals are positive, ranging between 0.0 and 0.75. For the remaining t values, the residuals range between negative 0.6 and 0.2. Both correlograms for r subscript k and r star subscript k for 40 lags look almost similar. The r subscript k value is 1.0 for k equals 0. The remaining r subscript k values for k equals 1 to 40 alternate between positive and negative and range between negative 0.2 and 0.15. The r star subscript k values follow the similar pattern of r subscript k value. All data are estimated.
There are no sample autocorrelation function values that fall outside of the limits [latex]\pm 1.96 / \sqrt{n}[/latex] in the plot in Figure 9.32 of the first 40 sample autocorrelation function values associated with the residuals. Since we expect [latex]40 \cdot 0.05 = 2[/latex] values to fall outside of these limits in the case of a good fit, we fail to reject H₀ in this case. The fit of the AR(4) model is not rejected by this test.
The additional R statements below calculate the Box–Pierce test statistic and the Ljung–Box test statistic and the associated p-values using the built-in Box.test function.

The Box–Pierce test statistic is 19.6 and the associated p-value is [latex]p = 0.984[/latex]. The Ljung–Box test statistic is 36.1 and the associated p-value is [latex]p = 0.418[/latex]. We fail to reject H₀ in both tests based on the chi-square critical value with [latex]40 - 5 = 35[/latex] degrees of freedom. Since both p-values exceed 0.05, the fit of the AR(4) model is not rejected by these tests.
The following additional R statements calculate the test statistic and the p-value for the turning point test applied to the time series consisting of the [latex]n - 4 = 51[/latex] residual values for the AR(4) fit to the transformed annual lynx pelt sales time series.

The tail probability is doubled because the alternative hypothesis is two-tailed for the turning point test. The test statistic is 0.1127 and the p-value is [latex]p = 0.91[/latex]. The turning point test detected 33 turning points in the time series of the 51 residuals, and that is about the number that we expect to have if the residuals from the fitted AR(4) time series model of the transformed annual lynx pelt sales were mutually independent random variables. We again fail to reject the null hypothesis in this case. The fit of the AR(4) model is not rejected by this test.
The residuals are standardized by dividing by their sample standard deviation. The following additional R statements plot a histogram of the standardized residuals using the hist function and a QQ plot to assess normality using the qqnorm function.

The plots are shown in Figure 9.33. The histogram shows that all standardized residuals fall between [latex]-3[/latex] and 3, but deviate significantly from a bell-shaped probability distribution, particularly in the right-hand tail. The horizontal axis on the histogram is the standardized residual and the vertical axis is the frequency. The QQ plot shows considerable nonlinearity, indicating a possible departure from normality based on the [latex]n - 4 = 51[/latex] residuals plotted. The horizontal axis on the QQ plot is the standardized theoretical quantile and the vertical axis is the associated normal data quantile. These plots indicate that a formal statistical goodness-of-fit test for normality should be conducted in order to assess whether Gaussian white noise is appropriate for the residuals of the fitted AR(4) time series model based on these two plots.

In summary, the model adequacy tests applied to the residuals on the AR(4) time series model of the transformed observations of the annual lynx pelt sales have revealed that the mutual independence of the residuals cannot be rejected by four statistical tests. The histogram and QQ plot of the residuals appear to not support the assumption of normally distributed residuals. We conclude that the AR(4) time series model is an adequate model for the transformed annual lynx pelt sales at the Hudson's Bay Company time series, with the exception of non-Gaussian error terms apparent in Figure 9.33. Another way to visually assess the adequacy of the time series model is to inspect time series plots of simulations of the fitted model, which is left as an exercise.

Figure 9.33: Histogram (left) and QQ plot (right) of the fitted AR(4) standardized residuals.

Long Description for Figure 9.33

In the histogram, the horizontal axis ranges from negative 3 to 3 in increments of 1 unit. The vertical axis ranges from 0 to 20 in increments of 5 units. There are six bars from negative 3 to 3 and the heights from left to right are 2, 6, 14, 21, 1, and 2 units. In the Q Q plot, the horizontal and the vertical axes range from negative 3 to 3 in increments of 1 unit. Fifty one residuals are plotted in an approximately linear trend. Among them, a cluster is formed between negative 1 and 1 on the horizontal axis and between negative 1 and 1 on the vertical axis. The first few and the last few data points lie away from the cluster. All data are estimated.

Forecasting

We now consider the question of forecasting future values of a time series that is governed by a shifted AR(p) time series model. In the case of the annual lynx pelt sales time series, this corresponds to the one-step-ahead forecast for 1912, the two-steps-ahead forecast for 1913, the three-steps-ahead forecast for 1914, etc. To review forecasting notation, the observed time series values are [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. The forecast is being made at time [latex]t = n[/latex]. The random future value of the time series that is h time units in the future is denoted by [latex]X_{n + h}[/latex]. The associated forecasted value is denoted by [latex]\hat{X}_{n + h}[/latex], and is the conditional expected value

$\begin{array}{l} {\hat{X}}_{n + h} = E [X_{n + h} | X_{1} = x_{1}, X_{2} = x_{2}, \dots, X_{n} = x_{n}] . \end{array}$

We would like to find this forecasted value and an associated prediction interval for a shifted AR(p) model. As in Section 8.2.2, we assume that all parameters are known in the derivations that follow. We also assume that the parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] correspond to a stationary shifted AR(p) time series model and [latex]p < n[/latex].

The shifted AR(p) model is

$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t} . \end{array}$

Replacing t by [latex]n + 1[/latex] and solving for [latex]X_{n + 1}[/latex], this becomes

$\begin{array}{l} X_{n + 1} = μ + ϕ_{1} (X_{n} - μ) + ϕ_{2} (X_{n - 1} - μ) + \dots + ϕ_{p} (X_{n - p + 1} - μ) + Z_{n + 1} . \end{array}$

Taking the conditional expected value of each side of this equation results in the one-step-ahead forecast

$\begin{array}{l} {\hat{X}}_{n + 1} = μ + ϕ_{1} (x_{n} - μ) + ϕ_{2} (x_{n - 1} - μ) + \dots + ϕ_{p} (x_{n - p + 1} - μ) \end{array}$

because the final p observations [latex]x_{n - p + 1}, \, x_{n - p + 2}, \, \ldots , \, x_n[/latex] in the time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] have already been observed. The forecasted value at time [latex]n + 1[/latex] is a function of the final p values in the time series. Applying this same process to the predicted value at time [latex]n + 2[/latex] results in the time series model

$\begin{array}{l} X_{n + 2} = μ + ϕ_{1} (X_{n + 1} - μ) + ϕ_{2} (X_{n} - μ) + \dots + ϕ_{p} (X_{n - p + 2} - μ) + Z_{n + 2} . \end{array}$

This time, the value of [latex]X_{n + 1}[/latex] has not been observed, so we replace it by its forecasted value when taking the conditional expected value of both sides of the equation

$\begin{array}{l} {\hat{X}}_{n + 2} = μ + ϕ_{1} ({\hat{X}}_{n + 1} - μ) + ϕ_{2} (x_{n} - μ) + \dots + ϕ_{p} (x_{n - p + 2} - μ), \end{array}$

because [latex]x_{n - p + 2}, \, x_{n - p + 3}, \, \ldots , \, x_n[/latex] have already been observed. Continuing in this fashion, a recursive formula for the forecasted value of [latex]X_{n + h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} = μ + ϕ_{1} ({\hat{X}}_{n + h - 1} - μ) + ϕ_{2} ({\hat{X}}_{n + h - 2} - μ) + \dots + ϕ_{p} ({\hat{X}}_{n + h - p} - μ) . \end{array}$

Although we would prefer an explicit formula, the recursive formula is easy to implement for an observed time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. As in the case of the AR(1) and AR(2) models, long-term forecasts for a stationary AR(p) time series model tend to μ as the time horizon [latex]h \rightarrow \infty[/latex].

We would like to pair the point estimator [latex]\hat{X}_{n + h}[/latex] with an interval estimator, which is a prediction interval in this setting. The prediction interval gives us an indication of the precision of the forecast. In order to derive an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X _ {n + h}[/latex], it is helpful to write the shifted AR(p) model as a shifted MA(∞) model. The coefficients θ₁, θ₂, … of a stationary shifted AR(p) model written as an MA(∞) model

$\begin{array}{l} X_{t} = μ + Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots \end{array}$

are given in terms of [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] as was illustrated for [latex]p = 4[/latex] in Example 9.23. Consider this model at time [latex]t = n + 1[/latex]. Since the error terms [latex]Z_n, \, Z_{n - 1}, \, Z_{n - 2}, \, \ldots[/latex] are unknown but fixed because they are associated with the observed time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex], the conditional population variance of [latex]X_{n + 1}[/latex] is

$\begin{array}{l} V [X_{n + 1}] = V [Z_{n + 1}] = σ_{Z}^{2} \end{array}$

because the population variance of μ is zero and [latex]Z_{n + 1}[/latex] is the only random term in the model. The error terms at time n and prior are observed and can therefore be treated as constants. Likewise, considering the MA(∞) model at time [latex]t = n + 2[/latex], the conditional population variance of [latex]X_{n+2}[/latex] is

$\begin{array}{l} V [X_{n + 2}] = V [Z_{n + 2} + θ_{1} Z_{n + 1}] = (1 + θ_{1}^{2}) σ_{Z}^{2} . \end{array}$

Similarly, the conditional population variance of [latex]X_{n+3}[/latex] is

$\begin{array}{l} V [X_{n + 3}] = V [Z_{n + 3} + θ_{1} Z_{n + 2} + θ_{2} Z_{n + 1}] = (1 + θ_{1}^{2} + θ_{2}^{2}) σ_{Z}^{2} . \end{array}$

Continuing in this fashion, the conditional population variance of [latex]X_{n+h}[/latex] is

$\begin{array}{l} V [X_{n + h}] = (1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}) σ_{Z}^{2} . \end{array}$

If we assume that the white noise terms in the MA(∞) representation of the AR(p) time series model are Gaussian white noise terms, then [latex]X_{n + h}[/latex] is also normally distributed because a linear combination of mutually independent normal random variables is also normally distributed. So an exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]\hat{X} _ {n + h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} - z_{α / 2} \sqrt{1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}} σ_{Z} < X_{n + h} < {\hat{X}}_{n + h} + z_{α / 2} \sqrt{1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{h - 1}^{2}} σ_{Z} . \end{array}$

In most practical problems, the parameters in this prediction interval will be estimated from data, which results in the following approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval provided next.

Theorem 9.24 For a stationary shifted AR(p) time series model, a forecasted value of [latex]X_{n + h}[/latex] can be calculated by the recursive equation

$\begin{array}{l} {\hat{X}}_{n + h} = \hat{μ} + {\hat{ϕ}}_{1} ({\hat{X}}_{n + h - 1} - \hat{μ}) + {\hat{ϕ}}_{2} ({\hat{X}}_{n + h - 2} - \hat{μ}) + \dots + {\hat{ϕ}}_{p} ({\hat{X}}_{n + h - p} - \hat{μ}), \end{array}$

where [latex]\hat{X}_{n + 1} = \hat \mu + \hat \phi_1 \left( x_{n} - \hat \mu \right) + \hat \phi_2 \left( x_{n - 1} - \hat \mu \right) + \cdots + \hat \phi_p \left( x_{n - p + 1} - \hat \mu \right)[/latex]. An approximate two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]X_{n + h}[/latex] is

$\begin{array}{l} {\hat{X}}_{n + h} - z_{α / 2} \sqrt{1 + {\hat{θ}}_{1}^{2} + {\hat{θ}}_{2}^{2} + \dots + {\hat{θ}}_{h - 1}^{2}} {\hat{σ}}_{Z} < X_{n + h} < {\hat{X}}_{n + h} + z_{α / 2} \sqrt{1 + {\hat{θ}}_{1}^{2} + {\hat{θ}}_{2}^{2} + \dots + {\hat{θ}}_{h - 1}^{2}} {\hat{σ}}_{Z}, \end{array}$

where [latex]\hat \theta_1, \, \hat \theta_2, \, \ldots[/latex] are the estimated coefficients in the MA(∞) model associated with the estimated AR(p) model.

Example 9.31 For the time series of annual lynx pelt sales [latex]x_1, \, x_2, \, \ldots, \, x_{55}[/latex] from Example 9.29, forecast the next five values (for years 1912–1916) in the time series and give approximate 95% prediction intervals for the forecasted values assuming that the transformed values in the time series arise from a shifted AR(4) time series model with parameters estimated by maximum likelihood.

The R code below uses the ar function to estimate the parameters in the shifted AR(4) time series model to the natural logarithm of the time series values via maximum likelihood estimation. The predict function implements Theorem 9.24 to calculate the forecasted values and associated standard errors for the fitted AR(4) model. These standard errors can be used to calculate approximate 95% prediction interval limits. The R exp function is used to convert the forecasted values [latex]\hat{X}_{t + h}[/latex], whose units are the natural logarithm of the annual number of lynx pelts sold, back to the original time series, whose units are the annual number of lynx pelts sold.

The results for the first five forecasted values are summarized in Table 9.13.

Table 9.13: Forecasts and 95% prediction intervals for the annual lynx pelt sales.
Time	[latex]t=56[/latex]	[latex]t=57[/latex]	[latex]t=58[/latex]	[latex]t=59[/latex]	[latex]t=60[/latex]
Year	1912	1913	1914	1915	1916
Forecast	5750	14,639	41,540	74,000	75,380
Lower prediction bound	3078	6642	17,962	31,907	31,031
Upper prediction bound	10,742	32,263	96,067	171,620	183,116

For the first time, we have encountered prediction interval bounds which are not symmetric about the point estimate because of the exponentiation of the forecasted values and their prediction intervals. The widths of the prediction intervals in Table 9.13 tend to increase with the predicted value. The first forecasted value has 95% prediction interval

$\begin{array}{l} 3078 < Y_{56} < 10,742 \end{array}$

and the fifth forecasted value has the considerably wider 95% prediction interval

$\begin{array}{l} 31, 031 < Y_{60} < 183,116 . \end{array}$

Figure 9.34 shows (a) the original time series [latex]y_{1}, \, y_{2}, \, \ldots, \, y_{55}[/latex] as points ([latex]\bullet[/latex]) connected by lines, (b) the first 40 forecasted annual lynx pelt sales [latex]\hat{Y}_{56}, \, \hat{Y}_{57}, \, \ldots , \, \hat{Y}_{95}[/latex] as open circles ([latex]\circ[/latex]), and (c) the 95% prediction intervals as a shaded region. There are six key observations concerning Figure 9.34.

Figure 9.34: Annual lynx pelt sales forecasts and 95% prediction intervals.

Long Description for Figure 9.34

The horizontal axis t ranges from 1 to 95. The vertical axis y subscript t ranges from 0 to 200,000 in increments of 25000. A horizontal line is drawn at 25000. The first 55 residuals follow a spike pattern and the values range between 0 and 75000. The distribution of residuals forms 6 peaks at t equals 3, 12, 22, 32, 41, and 51, with the highest occurring at t equals 32 with a value of 78000. The low values occur at t equals 7, 17, 27, 36, 45, and 54 with the lowest at t equals 54 with the value pf 3700. The forecasted values from t equals 55 to 95 are shown as circles and follow a pattern of progressively decreasing peaks about the horizontal line and the amplitudes are getting smaller. The forecasted values range between 0 and 75,000. The 95 percent prediction interval from t equals 55 to 95 is shown as a shaded region between the four larger peaks and four smaller peaks. The four larger peaks coincide with y subscript t values 180000, 200,000, 190,000, and 175,000 ,from left to right. The smaller peaks directly below the larger peaks coincide with y subscript t values of 30000, 20,000, 10,000, and 5000 from left to right. All data are estimated.

The forecasted values exhibit a similar periodicity to that of the original time series.
The forecasted values seem to be reasonable estimates of the future values of the time series for the first cycle or two.
The widths of the 95% prediction intervals associated with [latex]h = 7, \, 8, \, \ldots , \, 12[/latex] are considerably narrower than the width of the 95% prediction interval at [latex]h = 5[/latex]. So unlike the previous two time series (the active beaver temperatures fit to the AR(1) model in Example 9.10 and the Lake Huron levels fit to the AR(2) model in Example 9.21), the prediction interval widths do not increase monotonically in h.
The amplitude of the cycles of the forecasted values decreases with time. As the time horizon h increases, it becomes less certain where the time series is in a cycle, so the forecasts converge to [latex]\hat \mu = \bar y = 25,600[/latex] pelts sold annually. Depending on the application, this might not be a welcome aspect of the forecasted values. Using a time series model that explicitly contains a cyclic component might be more appropriate for forecasting in this setting.
The random sampling variability which is evident in the observed time series values [latex]y_1, \, y_2, \, \ldots, \, x_{55}[/latex] is less evident in the forecasted values [latex]\hat{Y}_{56}, \, \hat{Y}_{57}, \, \ldots , \, \hat{Y}_{95}[/latex]. Observed time series values tend to exhibit the typical random sampling variability; forecasted values for a stationary shifted AR(4) time series model of the transformed time series tend to be smooth.

This subsection has introduced the AR(p) time series model. The important results for an AR(p) model are listed below.

The standard AR(p) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + \dots + ϕ_{p} X_{t - p} + Z_{t} and ϕ (B) X_{t} = Z_{t}, \end{array}$

where [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B ^ 2 - \cdots - \phi_p B ^ p[/latex] is the characteristic polynomial and [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex] (Definition 9.3).
The shifted AR(p) model can be written algebraically and with the backshift operator B as (Theorem 9.20)
$\begin{array}{l} X_{t} - μ = ϕ_{1} (X_{t - 1} - μ) + ϕ_{2} (X_{t - 2} - μ) + \dots + ϕ_{p} (X_{t - p} - μ) + Z_{t} and ϕ (B) (X_{t} - μ) = Z_{t} . \end{array}$
The AR(p) model is always invertible; the AR(p) model is stationary when the solutions of [latex]\phi(B) = 0[/latex] all lie outside of the unit circle in the complex plane (Theorem 8.3).
The AR(p) population autocorrelation function is a mixture of damped exponential functions, associated with real roots of [latex]\phi(B) = 0[/latex], and damped sinusoidal functions, associated with complex roots of [latex]\phi(B) = 0[/latex] (Theorem 9.18).
The AR(p) population partial autocorrelation function cuts off after lag p (Theorem 9.19), making its shape easier to recognize than the population autocorrelation function for the statistical counterparts associated with a realization of a time series.
The stationary shifted AR(p) model can be written as a shifted MA(∞) model (as illustrated in Example 9.23).
The [latex]p + 2[/latex] parameters in the shifted AR(p) model, μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\dots[/latex], [latex]\phi_p[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex], can be estimated from a realization of a time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] by the method of moments (Theorem 9.21), least squares (Theorem 9.22), and maximum likelihood. The point estimators for μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], [latex]\dots[/latex], [latex]\phi_p[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex] are denoted by [latex]\hat \mu[/latex], [latex]\hat \phi_1[/latex], [latex]\hat \phi_2[/latex], [latex]\ldots[/latex], [latex]\hat \phi_p[/latex], and [latex]\hat \sigma _ Z ^ {\, 2}[/latex], and are typically paired with asymptotically exact two-sided [latex]{100(1 - \alpha)}\%[/latex] confidence intervals (Theorem 9.23).
The forecasted value [latex]\hat{X} _ {n + h}[/latex] in a shifted AR(p) model is a function of the last p values in an observed time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] and can be calculated by a recursive formula. The forecast approaches [latex]\hat \mu = \bar x[/latex] as the time horizon [latex]h \rightarrow \infty[/latex]. The associated prediction interval has width that increases as h increases and approaches a limit as the time horizon [latex]h \rightarrow \infty[/latex] (Theorem 9.24).

9.1.4 Computing

The R time series functions used in this section are summarized here. The ARMAacf function computes the population autocorrelation function or the population partial autocorrelation function for an ARMA(p, q) time series model. The generic version of the function is

where ar is a vector containing the autoregressive coefficients [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], ma is a vector containing the moving average coefficients [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex], lag.max contains the number of lags required, and pacf is a logical object. The function returns [latex]\rho(0), \, \rho(1), \, \ldots , \, \rho( {\tt lag.max} )[/latex] when pacf is FALSE, or [latex]\rho ^ * (1), \, \rho ^ * (2), \, \ldots , \, \rho ^ * ( {\tt lag.max} )[/latex] when pacf is TRUE. The ARMAacf function is illustrated in Example 9.25.

The arima.sim function generates a simulation of a time series. The generic version of the function is

where model is a list with components ar containing the autoregressive coefficients [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], and ma containing the moving average coefficients [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex], n is the length of the simulated time series to be generated, rand.gen is a function to generate the white noise terms, n.start is the length of the warm-up period, and start.innov is a time series of white noise terms used in the warm-up period. The returned value is a vector containing the n simulated time series values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex]. The arima.sim function is illustrated in Examples 9.2, 9.13, and 9.18. The warm-up period associated with the arima.sim function can be avoided by generating initial values from the appropriate multivariate distribution. For an AR(1) model with Gaussian white noise error terms, the rnorm function, whose generic syntax is

where n is the number of random variates to generate, mean is the population mean, and sd is the population standard deviation, can be used to seed the simulated time series. The rnorm function is illustrated in Example 9.1. For an AR(p) model, with [latex]p > 1[/latex], with Gaussian white noise error terms, the mvrnorm function from the MASS package, whose generic syntax is

where n is the number of random vectors to generate, mu is the population mean vector, and Sigma is the population variance–covariance matrix, can be used to seed the simulated time series. The mvrnorm function is illustrated in Examples 9.12 and 9.26.

When determining parameter estimates that cannot be expressed in closed form, the optim function provides general-purpose optimization capability that can be applied to minimizing the sum of squares to find the least squares estimates or maximizing the log likelihood function to find the maximum likelihood estimators. The generic syntax for optim is

where par is a vector containing initial parameter estimates and fn is the function to be minimized (by default). The optim function is illustrated in Examples 9.5, 9.6, and 9.16. A parameter estimation function that is exclusively for autoregressive time series models is ar. The generic format for ar is

where x is a vector containing the observed time series values, aic is a logical variable (TRUE means that the Akaike Information Criterion is used to choose the order of the model and FALSE means that an autoregressive model of order order.max is fitted), order.max is maximum order of the autoregressive model to fit, method is the estimation method ("yule-walker" or "yw" for Yule–Walker, "burg" for Burg's algorithm, "ols" for least squares, "mle" for maximum likelihood), and na.action indicates how to handle missing values in the time series. The ar function is illustrated in Examples 9.7, 9.8, 9.10, 9.17, 9.19, 9.21, 9.27, 9.28, and 9.31. The arima function also estimates parameters from an observed time series. The generic format for arima is

where x is a vector containing the observed time series values, order is a vector containing the values of p, d, and q, include.mean is a logical variable (TRUE includes estimation of a population mean term μ and FALSE estimates just the parameters in the standard model), and method is CSS (conditional sum of squares) or ML (maximum likelihood). The arima function is illustrated in Examples 9.9, 9.18, 9.20, and 9.29.

Three functions were introduced in this section for assessing model adequacy. The Box.test function computes the Box–Pierce or Ljung–Box test statistic and associated p-value. The generic syntax is

where x is a vector containing the observed time series values, lag is the number of sample autocorrelation function values to be used in the test, type is either "Box-Pierce" or "Ljung-Box", and fitdf is the number of degrees of freedom to be subtracted in the case of x being a time series of residuals. The Box.test function is illustrated in Examples 9.8, 9.19, and 9.30, along with the hist and qqnorm functions, which are helpful in visually assessing the normality of the residuals.

Forecasting can be performed automatically using the generic predict function, which calculates predicted values of a time series from a fitted function. The predict function is illustrated in Examples 9.10, 9.21, and 9.31.

More details on the R functions used in this section can be found using the help function. Sample invocations of the functions are displayed using the example function.

This concludes the introduction to the autoregressive time series model, with subsections devoted to the AR(1), AR(2), and AR(p) models. An analogous treatment of moving average models is contained in the next section.

9.2 Moving Average Models

Moving average models for a time series will be introduced in this section. A moving average model of order q is a special case of an ARMA(p, q) model with no autoregressive terms (that is, [latex]p = 0[/latex]) and q moving average terms, specified as

$\begin{array}{l} X_{t} = Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots + θ_{q} Z_{t - q}, \end{array}$

where [latex]\theta_1, \, \theta_2 , \ldots , \, \theta_q[/latex] are real-valued parameters and [latex]\left\{ Z_t \right\}[/latex] is a time series of white noise. Rather than diving right into an MA(q) model, we first have separate subsections for the MA(1) and MA(2) models because the mathematics are somewhat easier than the general case and some important geometry and intuition can be developed with these restricted models. In the subsection on the MA(1) model that follows, we will

define the time series model for [latex]\left\{ X_t \right\}[/latex],
determine the values of the parameters associated with an invertible model,
derive the population autocorrelation and partial autocorrelation functions,
develop algorithms for simulating observations from the time series,
inspect simulated realizations to establish patterns, and
estimate parameters from a time series realization [latex]\left\{ x_t \right\}[/latex].

The important steps of model assessment, model selection, and forecasting future values of the times series are left as exercises because they follow along the same lines as those steps for the autoregressive models covered in the previous section.

The purpose of deriving the population autocorrelation and partial autocorrelation functions is to build an inventory of shapes and patterns for these functions that can be used to identify tentative time series models from their sample counterparts by making a visual comparison between population and sample versions. This inventory of shapes and patterns plays an analogous role to knowing the shapes of various probability density functions (for example, the bell-shaped normal probability density function or the rectangular-shaped uniform distribution) in the analysis of univariate data in which the shape of the histogram is visually compared to the inventory of probability density function shapes.

In the MA(1) subsection that follows, a single example of a time series will be carried through the various statistical procedures given in the list above. Stationarity plays a critical role in time series analysis because we are not able to forecast future values of the time series without knowing that the probability model is stable over time. This is why the visual assessment of a plot of the time series is always a critical first step in the analysis of a time series. Fortunately, all MA(q) time series models are stationary.

9.2.1 The MA(1) Model

The moving average model with one term is the simplest of the ARMA family of time series models in terms of the ability to derive probabilistic properties.

An observed value in the time series, X_t, is given by the current white noise term, plus the parameter θ multiplied by the white noise term from one time period ago. No subscript is necessary on the θ parameter because there is only one θ parameter in the MA(1) model. So there are two parameters that define an MA(1) model: the coefficient θ and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex].

Some authors prefer to parameterize the MA(1) model as

$\begin{array}{l} X_{t} = θ_{0} Z_{t} + θ_{1} Z_{t - 1}, \end{array}$

where θ₀ and θ₁ are real-valued parameters. We avoid this parameterization because the θ₀ parameter is redundant in the sense that the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex] is absorbed into the θ₀ parameter. Also, some authors use a - rather than a + between two terms on the right-hand side of the model.

To illustrate the thinking behind the MA(1) model in a specific context, let X_t represent the monthly unemployment, as a percentage, in month t. The MA(1) model indicates that this month's unemployment, denoted by X_t, equals θ multiplied by last month's random white noise term, [latex]\theta Z_{t - 1}[/latex], plus this month's random white noise term Z_t.

MA(1) models are used less often than autoregressive models, and this is partly due to more limited potential shapes for the population autocorrelation function, as will be seen next.

Stationarity and the Population Autocorrelation Function

One initial important question concerning the MA(1) model is whether or not the model is stationary. Rather than appealing to Theorem 8.4, we show this below using first principles. Recall from Definition 7.6 that a time series model is stationary if (a) the expected value of X_t is constant for all t, and (b) the population covariance between X_s and X_t depends only on the lag [latex]|t - s|[/latex]. The expected value of X_t is

$\begin{array}{l} E [X_{t}] = E [Z_{t} + θ Z_{t - 1}] = E [Z_{t}] + θ E [Z_{t - 1}] = 0 \end{array}$

for all values of the parameters θ and [latex]\sigma _ Z ^ {\, 2}[/latex], and all values of t. Using the defining formula for population covariance, the population autocovariance function is

$\begin{array}{l} γ (s, t) & = Cov (X_{s}, X_{t}) \\ = E [(X_{s} - E [X_{s}]) (X_{t} - E [X_{t}])] \\ = E [X_{s} X_{t}] \\ = E [(Z_{s} + θ Z_{s - 1}) (Z_{t} + θ Z_{t - 1})] \\ = E [Z_{s} Z_{t}] + θ E [Z_{s - 1} Z_{t}] + θ E [Z_{s} Z_{t - 1}] + θ^{2} E [Z_{s - 1} Z_{t - 1}] \\ = {\begin{cases} V [Z_{t}] + θ^{2} V [Z_{t - 1}] & | t - s | = 0 \\ θ V [Z_{t}] & | t - s | = 1 \\ 0 & | t - s | = 2, 3, \dots \end{cases} \\ = {\begin{cases} (1 + θ^{2}) σ_{Z}^{2} & | t - s | = 0 \\ θ σ_{Z}^{2} & | t - s | = 1 \\ 0 & | t - s | = 2, 3, \dots . \end{cases} \end{array}$

Since [latex]E \left[ X_t \right] = 0[/latex] for all values of t and the population autocovariance function depends only on the lag [latex]|t - s|[/latex], we conclude that the MA(1) time series model is stationary. Furthermore, the population autocovariance function can be expressed in terms of the lag k as

$\begin{array}{l} γ (k) = {\begin{cases} (1 + θ^{2}) σ_{Z}^{2} & k = 0 \\ θ σ_{Z}^{2} & k = 1 \\ 0 & k = 2, 3, \dots . \end{cases} \end{array}$

Dividing by the population autocovariance function by [latex]\gamma(0) = V \left[ X_t \right] = \left( 1 + \theta ^ 2 \right) \sigma _Z ^ {\, 2}[/latex] gives the population autocorrelation function

$\begin{array}{l} ρ (k) = {\begin{cases} 1 & k = 0 \\ θ / (1 + θ^{2}) & k = 1 \\ 0 & k = 2, 3, \dots . \end{cases} \end{array}$

This derivation constitutes a proof of the following result.

So the population autocorrelation function consists of a single nonzero value at lag 1 for a nonzero parameter θ and zero values thereafter. Six important observations concerning this population autocorrelation function are given below.

- The sign of [latex]\rho(1)[/latex] is the same as the sign of θ.
- The population autocorrelation function cuts off after lag 1 for an MA(1) time series model. The time series model has a “memory” of just one time period. Figure 9.35 illustrates the relationship between the white noise values [latex]\left\{ Z_t \right\}[/latex] and the MA(1) time series observations [latex]\left\{ X_t \right\}[/latex]. Observations of the time series that are two or more time periods apart, such as X₂ and X₄, have no white noise terms in common, so the lag 2 population autocorrelation, [latex]\rho(2)[/latex], is zero. The third observation in the time series X₃, for example, shares the white noise term Z₂ with X₂ and the white noise term Z₃ with X₄, but is not affected by any white noise terms before Z₂ or after Z₃.
  
  Figure 9.35: Relationship between white noise [latex]\left\{ Z_t \right\}[/latex] and [latex]\left\{ X_t \right\}[/latex] for an MA(1) model.
- The lag 1 population autocorrelation [latex]\rho(1) = \theta / \left(1 + \theta ^ 2 \right)[/latex] can be written as a quadratic equation in θ as
  $\begin{array}{l} ρ (1) θ^{2} - θ + ρ (1) = 0. \end{array}$
  
  For nonzero values of θ, the two roots of this quadratic equation are both positive or both negative. Furthermore, a little algebra reveals that the product of the two roots of this quadratic equation equals 1. Figure 9.36 shows the parabolas associated with this quadratic equation for [latex]\rho(1) = 2 / 5[/latex] (with associated roots [latex]\theta = 1 / 2[/latex] and [latex]\theta = 2[/latex]) and [latex]\rho(1) = -2 / 5[/latex] (with associated roots [latex]\theta = -1 / 2[/latex] and [latex]\theta = -2[/latex]).
  
  Figure 9.36: The parabola [latex]g(\theta) = \rho(1) \theta ^ 2 - \theta + \rho(1)[/latex] for [latex]\rho(1) = 2 / 5[/latex] and [latex]\rho(1) = - 2 / 5[/latex].
  
  Long Description for Figure 9.36
  
  The horizontal axis theta ranges from negative 3 to 3 in increments of 1 unit. The vertical axis g of theta ranges from negative 0.3 to 0.3 in increments of 0.1 units. A convex parabola for rho of 1 value 2 over 5 has the lowest point at (1.25, negative 0.2) and intersects the horizontal axis at (0.5, 0) and (2, 0). A concave parabola for rho of 1 value negative 2 over 5 has the highest point at (negative 1.25, 0.2) and intersects the horizontal axis at (negative 0.2, 0) and (negative 0.5, 0). All data are estimated.
  
  Figure 9.37: Graph of [latex]\rho(1) = \theta / \left( 1 + \theta ^ 2 \right)[/latex] versus θ for an MA(1) time series model.
  
  Long Description for Figure 9.37
  
  The horizontal axis theta ranges from negative 5 to 5 in increments of 1 unit. The vertical axis rho of 1 ranges from negative 0.6 to 0.6. The concave curve decreases in quadrant 3 from (negative 5, negative 0.2) to the lowest point (negative 1, negative 0.5), increases through the origin to the highest point (1, 0.5), and decreases as a convex curve to (5, 0.2). A dashed horizontal line is drawn at rho of 1 value 0.4, intersecting the curve twice at theta equals 0.5 and 2. All data are estimated.

The value [latex]\rho(1)[/latex] must lie in the interval [latex]-1 / 2 \le \rho(1) \le 1 / 2[/latex]. This can be seen in the plot of [latex]\rho(1) = \theta / \left( 1 + \theta ^ 2 \right)[/latex] versus θ given by the solid curve in Figure 9.37, which indicates that [latex]\rho(1)[/latex] is minimized at [latex]\rho(1) = - 1 / 2[/latex] when [latex]\theta = -1[/latex] and maximized at [latex]\rho(1) = 1 / 2[/latex] when [latex]\theta = 1[/latex]. This constraint means that the MA(1) model is more limited in application than the autoregressive models from the previous chapter. In order to fit an MA(1) model to observed time series values [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex], it must be the case that (a) the length of the time series n is large enough (about [latex]n = 50[/latex] or [latex]n = 60[/latex]) to use an ARMA model, (b) the sample autocorrelation function has a single statistically significant spike at lag 1, and (c) the statistically significant spike at lag 1 satisfies [latex]-1 / 2 \le r_1 \le 1 / 2[/latex] to be compatible with the constraint [latex]-1 / 2 \le \rho(1) \le 1 / 2[/latex].
Figure 9.37 also reveals a more subtle aspect of the population lag 1 autocorrelation. Notice that for [latex]\theta = 1 / 2[/latex], the population lag 1 autocorrelation is [latex]\rho(1) = 2 / 5[/latex]. But for [latex]\theta = 2[/latex], the population lag 1 autocorrelation is also [latex]\rho(1) = 2 / 5[/latex]. The geometry associated with these two values of θ resulting in the same value for [latex]\rho(1)[/latex] is indicated by the dashed lines in Figure 9.37. This problem is not just limited to [latex]\theta = 1 / 2[/latex] and [latex]\theta = 2[/latex]; there are an infinite number of pairs of θ values that will result in the same population lag 1 autocorrelation function value. More generally, the MA(1) model
$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1} \end{array}$

and the MA(1) model

$\begin{array}{l} X_{t} = Z_{t} + \frac{1}{θ} Z_{t - 1} \end{array}$

have identical population autocorrelation functions. This means that there is not a one-to-one correspondence between a particular value of θ and the associated value of [latex]\rho(1)[/latex]. This brings up the notion of invertibility, which was defined in Definition 8.3. An invertible time series model has a unique value of θ in the MA(1) model corresponding to a particular population autocorrelation function.

Invertibility

All MA(1) models are stationary per Theorem 8.3 because there are a finite number of moving average terms in Definition 9.4. Recall from Definition 8.3 that an ARMA(p, q) time series model for [latex]\left\{ X_t \right\}[/latex] is invertible if the white noise term at time t can be expressed as

$\begin{array}{l} Z_{t} = \sum_{j = 0}^{\infty} π_{j} X_{t - j}, \end{array}$

where the coefficients π_j satisfy

$\begin{array}{l} \sum_{j = 0}^{\infty} π_{j}^{2} < \infty . \end{array}$

There are no restrictions on θ necessary to ensure stationarity for an MA(1) model. However, it can be advantageous to restrict the values of θ in order to achieve invertibility. Returning to Figure 9.37, we can use the definition of invertibility to determine whether we use [latex]|\theta| < 1[/latex] or [latex]| \theta | > 1[/latex] for the invertibility region for an MA(1) model.

Just as we were able to write an AR(1) time series model as an MA(∞) time series model in Section 9.1.1, we now perform the algebraic steps necessary to write an MA(1) time series model as an AR(∞) time series model. We want to write Z_t in terms of current and previous values of X_t as shown in Definition 8.3. To begin, recall that the MA(1) model given by

$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1} \end{array}$

can be shifted in time and is equally valid for other t values, for example,

$\begin{array}{l} X_{t - 1} & = Z_{t - 1} + θ Z_{t - 2} \\ X_{t - 2} & = Z_{t - 2} + θ Z_{t - 3} \\ ⋮ & = ⋮ \end{array}$

These formulas can be solved for [latex]Z_{t - 1}[/latex], [latex]Z_{t - 2}[/latex], [latex]\ldots[/latex] as

$\begin{array}{l} Z_{t - 1} & = X_{t - 1} - θ Z_{t - 2} \\ Z_{t - 2} & = X_{t - 2} - θ Z_{t - 3} \\ ⋮ & = ⋮ \end{array}$

Making successive substitutions into the MA(1) model results in

$\begin{array}{l} X_{t} & = Z_{t} + θ Z_{t - 1} \\ = Z_{t} + θ (X_{t - 1} - θ Z_{t - 2}) \\ = Z_{t} + θ X_{t - 1} - θ^{2} Z_{t - 2} \\ = Z_{t} + θ X_{t - 1} - θ^{2} (X_{t - 2} - θ Z_{t - 3}) \\ = Z_{t} + θ X_{t - 1} - θ^{2} X_{t - 2} + θ^{3} Z_{t - 3} \\ ⋮ \\ = Z_{t} + θ X_{t - 1} - θ^{2} X_{t - 2} + θ^{3} X_{t - 3} - θ^{4} X_{t - 4} + \dots . \end{array}$

This can be recognized as an AR(∞) time series model.

Representing an MA(1) model as an AR(∞) model is known as duality. Solving this equation for Z_t gives

$\begin{array}{l} Z_{t} = X_{t} - θ X_{t - 1} + θ^{2} X_{t - 2} - θ^{3} X_{t - 3} + θ^{4} X_{t - 4} - \dots, \end{array}$

which is the form required for Definition 8.3. So the coefficients [latex]\pi_0, \,\pi_1, \, \pi_2, \, \ldots[/latex] for the MA(1) model from Definition 8.3 are

$\begin{array}{l} π_{0} = 1, π_{1} = - θ, π_{2} = θ^{2}, π_{3} = - θ^{3}, π_{4} = θ^{4}, \dots, \end{array}$

or in general, [latex]\pi_j = \left( - \theta \right) ^ j[/latex], for [latex]j = 0, \, 1, \, 2, \, \ldots[/latex]. Definition 8.3 requires that

$\begin{array}{l} \sum_{j = 0}^{\infty} | π_{j} | = \sum_{j = 0}^{\infty} | {(- θ)}^{j} | = 1 + θ + θ^{2} + θ^{3} + \dots < \infty \end{array}$

to achieve stationarity. This summation is a geometric series that converges when [latex]|\theta| < 1[/latex], so this is the invertibility region for an MA(1) model.

The invertibility criterion [latex]-1 < \theta < 1[/latex] ensures that each value of θ in the interval corresponds to a unique MA(1) time series model. Stated in another fashion, invertibility implies that there is a one-to-one correspondence between the value of the θ parameter and the population autocorrelation function.

The MA(1) time series model can be written in terms of the backshift operator B as

$\begin{array}{l} X_{t} = (1 + θ B) Z_{t} = Z_{t} + θ Z_{t - 1} . \end{array}$

Doubling up the use of θ as a function name, the expression

$\begin{array}{l} θ (B) = 1 + θ B \end{array}$

is the characteristic polynomial for the MA(1) model. Notice that the MA(1) model is invertible when [latex]| \theta | < 1[/latex], which corresponds to the root of [latex]\theta(B) = 0[/latex] falling outside of the interval [latex][-1, \, 1][/latex]. Solving [latex]\theta(B) = 1 + \theta B = 0[/latex] results in [latex]B = - 1 / \theta[/latex]. This notion will be generalized in the next two subsections for higher-order MA models as the roots of [latex]\theta(B) = 0[/latex] falling outside of the unit circle in the complex plane to establish invertibility.

Now that stationarity for all MA(1) time series models has been established, the condition for invertibility has been established, and the population autocorrelation function has been derived, we turn to determining the partial autocorrelation function.

Population Partial Autocorrelation Function

The population partial autocorrelation function can be determined by using the defining formula in Definition 7.4. The lag zero population partial autocorrelation is [latex]\rho ^ * (0) = 1[/latex]. The lag one population partial autocorrelation is [latex]\rho ^ * (1) = \rho(1) = \theta / \left( 1 + \theta ^ 2 \right)[/latex]. After a little algebra, the lag two population partial autocorrelation is

$\begin{array}{l} ρ^{*} (2) = \frac{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & ρ (2) \end{array} |}{| \begin{array}{cc} 1 & ρ (1) \\ ρ (1) & 1 \end{array} |} = \frac{ρ (2) - [ρ (1)]^{2}}{1 - [ρ (1)]^{2}} = - \frac{[ρ (1)]^{2}}{1 - [ρ (1)]^{2}} = - \frac{θ^{2} (1 - θ^{2})}{1 - θ^{6}} . \end{array}$

The lag three population partial autocorrelation is

$\begin{array}{l} ρ^{*} (3) = \frac{| \begin{array}{ccc} 1 & ρ (1) & ρ (1) \\ ρ (1) & 1 & ρ (2) \\ ρ (2) & ρ (1) & ρ (3) \end{array} |}{| \begin{array}{ccc} 1 & ρ (1) & ρ (2) \\ ρ (1) & 1 & ρ (1) \\ ρ (2) & ρ (1) & 1 \end{array} |} = \frac{[ρ (1)]^{3}}{1 - 2 [ρ (1)]^{2}} = \frac{θ^{3} (1 - θ^{2})}{1 - θ^{8}} . \end{array}$

This pattern generalizes to the lag k population partial autocorrelation

$\begin{array}{l} ρ^{*} (k) = \frac{(- 1)^{k + 1} θ^{k} (1 - θ^{2})}{1 - θ^{2 (k + 1)}} \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex], which can also be written as

$\begin{array}{l} ρ^{*} (k) = \frac{(- 1)^{k + 1} θ^{k}}{1 + θ^{2} + θ^{4} + \dots + θ^{2 k}} \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex]. This constitutes a proof of the following result.

When [latex]\theta = 0[/latex], both the population autocorrelation function and the partial autocorrelation function have just a single spike at [latex]\rho(0) = \rho ^ * (0) = 1[/latex]; the MA(1) model reduces to just white noise in this case. When [latex]0 < \theta < 1[/latex], [latex]\rho(1) > 0[/latex] and [latex]\rho^*(k)[/latex] tails out and alternates in sign. When [latex]-1 < \theta < 0[/latex], [latex]\rho(1) < 0[/latex] and [latex]\rho^*(k)[/latex] tails out and is negative for [latex]k = 1, \, 2, \, \ldots[/latex].

A population autocorrelation and partial autocorrelation function for the M A 1 model. — Figure 9.38: Graphs of [latex]\rho(k)[/latex] (left) and [latex]\rho^*(k)[/latex] (right) for an MA(1) model with [latex]\theta = 9 / 10[/latex].

Long Description for Figure 9.38

In the first correlogram, the horizontal axis k ranges from 0 to 8 in increments of 1 unit. The vertical axis rho of k ranges from negative 1 to 1 in increments of 1 unit. A horizontal line is drawn at rho subscript k value 0. The values of rho of k are 1 and 0.5, respectively for k values 0 and 1. In the second correlogram, the horizontal axis k ranges from 0 to 8 in increments of 1 unit. The vertical axis rho star of k ranges from negative 1 to 1. This function follows a damped sinusoidal fashion, with alternate signs and decreasing magnitude. All data are estimated.

The Shifted MA(1) Model

The population mean function for the MA(1) model is [latex]E \left[ X_t \right] = 0[/latex], which is not of much use in practice because most real-world time series are not centered around zero. Adding a third parameter μ to overcome this shortcoming results in the enhanced MA(1) model

$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1}, \end{array}$

which has population mean function [latex]E \left[ X_t \right] = \mu[/latex] and population autocorrelation function and population autocorrelation function given in Theorems 9.25 and 9.28 because population variance and covariance are unaffected by a shift in the time series model. There are now three parameters for the time series model: μ, θ, and [latex]\sigma _ Z ^ {\, 2}[/latex].

Simulation

An MA(1) time series can be simulated by appealing to the defining formula for the MA(1) model from Definition 9.4:

$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1} . \end{array}$

The algorithm given below generates an initial white noise value Z₀, and then uses an additional n white noise terms [latex]Z_1, \, Z_2, \, \ldots, \, Z_n[/latex] to generate the time series values [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex] using the MA(1) defining formula. Indentation denotes nesting in the algorithm.

The three-parameter shifted MA(1) time series model that includes a population mean parameter μ can be simulated by simply adding μ to each time series observation generated by this algorithm. So to generate a realization of an MA(1) time series model in R, we must define (a) the value of θ, (b) the distribution of the white noise, (c) the value of [latex]\sigma _Z ^ {\, 2}[/latex], and, if this is a shifted MA(1) model, (d) the value of the shift parameter μ.

Example 9.33 Generate a realization of [latex]n = 100[/latex] observations from an MA(1) time series model with [latex]\theta = 9 / 10[/latex], Gaussian white noise terms, and [latex]\sigma _ Z ^ {\, 2} = 1[/latex].

The parameter [latex]\theta = 9 / 10[/latex] just barely falls in the invertibility region [latex]-1 < \theta < 1[/latex]. This choice of θ results in a population lag 1 autocorrelation that is very close to its largest possible value. The population lag 1 autocorrelation function value associated with this model is

$\begin{array}{l} ρ (1) = \frac{θ}{1 + θ^{2}} = \frac{9 / 10}{1 + (9 / 10)^{2}} = \frac{90}{181} ≅ 0.4972, \end{array}$

so we expect a nearby value for r₁ from the simulated time series values. The R code below generates [latex]n = 100[/latex] simulated time series values and places them in the vector named x.

Use the plot.ts function to plot the time series contained in x, the acf function to plot the associated correlogram, and the pacf function plot the associated sample partial autocorrelation function.

Figure 9.39: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an MA(1) model.

Long Description for Figure 9.39

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from negative 5 to 5 in increments of 1 unit. A horizontal line is drawn at x subscript t value 0. The simulated values of spikes are centered about the value 0 with approximately half of the values being positive and half being negative. Most of the values range between negative 3 and 2. The lowest value negative 4 is at t equals 16 and the maximum value 3.5 is at t equals 100. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The r subscript k values are 1.0 and 0.4 for k equals 0 and 1, respectively. For k equals 2 to k equals 11, the r subscript k values range between negative 0.2 and 0, the next two values are 0.15 and 0.2, and the last two values are negative 0.05 and negative 0.1. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. The values for r star subscript k alternate between positive and negative with decreasing magnitude. All data are estimated.

The time series plot of the realization, the associated correlogram, and the associated sample partial autocorrelation function for [latex]\theta = 0.9[/latex] and standard normal noise terms are given in Figure 9.39. A horizontal line is drawn on the time series plot at [latex]E \left[ X_t \right] = 0[/latex]. The time series contains short runs above and below the population mean, which are consistent with the statistically significant lag 1 sample autocorrelation function value [latex]r_1 = 0.450[/latex]. This value is slightly smaller than the associated population value. The sample autocorrelation function values at lags 2 through 15 do not differ from zero by a statistically significant amount, except for the sample lag 10 autocorrelation function value [latex]r_{10} = -0.207[/latex]. This value falls just slightly outside of the 95% confidence bounds

$\begin{array}{l} \pm \frac{z_{α / 2}}{\sqrt{n}} = \pm \frac{z_{0.025}}{\sqrt{100}} = \pm \frac{1.96}{10} = \pm 0.196 . \end{array}$

This spike in the correlogram is not considered statistically significant because (a) there is nothing about the time series that would indicate that a lag of 10 is a special lag, (b) the spike in the correlogram at lag 10 falls just slightly outside of the confidence bounds, and (c) we expect 1 in 20 of the correlogram spikes to fall outside of the 95% confidence bounds because of random sampling variability, even if the MA(1) model were perfect (as it is in this case). The graphs of [latex]\rho(k)[/latex] and [latex]\rho ^ * (k)[/latex] mirror their population counterparts in Figure 9.38. You are encouraged to place these R statements in a for loop (of course, with the set.seed call outside of the loop) that generates multiple realizations of the MA(1) time series with a call to Sys.sleep to provide a short time delay for you to inspect the trio of plots. This will give you a feel for how this MA(1) time series, its correlogram, and its sample partial autocorrelation function vary from one realization to the next.

Another way to think about a realization of an MA(1) model is to make scatterplots of adjacent observations and observations that are two time units apart. The left-hand plot in Figure 9.40 illustrates the positive sample correlation between [latex]x_{t-1}[/latex] and x_t for the realization, which is consistent with the positive lag 1 population autocorrelation. The [latex]n - 1 = 99[/latex] pairs of points plotted are the adjacent values in the realization of the time series, [latex](x_{t-1}, \, x_{t})[/latex]. The population autocorrelation function cuts off after lag 1 for an MA(1) time series model. This is supported by the right-hand plot in Figure 9.40, which shows the [latex]n - 2 = 98[/latex] pairs [latex](x_{t-2}, \, x_{t})[/latex] for the realization, which appear to be independent. The regression lines have been added to each plot. The additional R statements below indicate that the p-values for the statistical significance of the slopes associated with the two plots are [latex]p = 9.3 \cdot 10 ^ {-7}[/latex] and [latex]p = 0.65[/latex], respectively.

These p-values indicate that the slope of the line in the left-hand plot differs significantly from zero, while the slope of the line in the right-hand plot does not differ significantly from zero.

Figure 9.40: Scatterplots of pairs of simulated MA(1) observations.

Long Description for Figure 9.40

In the first scatter plot graph, the horizontal axis x subscript t minus 1 ranges from negative 5 to 5 in increments of 5 units. The vertical axis x subscript t ranges from negative 5 to 5 in increments of 5 units. A regression line with a positive slope is drawn from (negative 5, negative 2.5) to (5, 2.5). Ninety nine pairs of points are plotted in an increasing trend around the regression line between negative 2.5 and 2.5 on the horizontal axis and between negative 2.5 and 2.5 on the vertical axis. The cluster forms in the middle of the regression line. In the second scatter plot graph, the horizontal axis x subscript t minus 2 ranges from negative 5 to 5, and the vertical axis x subscript t ranges from negative 5 to 5. A regression line is drawn from (negative 5, 0) to (5, negative 0.5), and is almost linear. Ninety eight pairs of data points are plotted around the regression line between negative 2 and 2 on the horizontal axis and the vertical axis. Data points form a cluster roughly around the origin. All data are estimated.

Example 9.34 Generate a realization of [latex]n = 100[/latex] observations from a shifted MA(1) time series model with [latex]\mu = 20[/latex], [latex]\theta = -9 / 10[/latex], Gaussian white noise terms, and [latex]\sigma _ Z ^ {\, 2} = 1[/latex].

This time series corresponds to the opposite extreme case of the MA(1) because the coefficient [latex]\theta = -0.9[/latex] also (barely) falls in the invertibility region [latex]-1 < \theta < 1[/latex]. We again assume that the Gaussian white noise has population variance [latex]\sigma _ Z ^ {\, 2} = 1[/latex], but now include a shift parameter [latex]\mu = 20[/latex]. The population lag 1 autocorrelation function value associated with this model is

$\begin{array}{l} ρ (1) = \frac{θ}{1 + θ^{2}} = \frac{- 9 / 10}{1 + (- 9 / 10)^{2}} = - \frac{90}{181} ≅ - 0.4972 . \end{array}$

So this choice of θ results in a population lag 1 autocorrelation that is very close to its smallest possible value. The R code below simulates [latex]n = 100[/latex] observations from this shifted MA(1) model. This code differs from the previous code in that it avoids the use of a for loop, which is a more efficient way to generate observations in R. The time series plot of the realization of [latex]n = 100[/latex] observations and the associated correlogram for [latex]\theta = -0.9[/latex] is given in Figure 9.41.

Figure 9.41: Time series plot, r_k, and [latex]r_k^*[/latex] for [latex]n = 100[/latex] simulated values from an MA(1) model.

Long Description for Figure 9.41

In the time series plot, the horizontal axis t ranges from 1 to 100. The vertical axis x subscript t ranges from 15 to 25 in increments of 5 units. A horizontal line is drawn at 20. Approximately half of the spiked pattern of values lie above the horizontal line, and half lie below the line. The x subscript t values range between 17 and 22. Most of the values are centered on the value 20, ranging between 19 and 21. The lowest value is 17 at t equals 44 and the highest value is 23 at t equals 63. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A horizontal line is drawn at 0.0. The r subscript k values are 1.0 and negative 0.5 for k values 0 and 1, respectively. The remaining values range between negative 0.1 and 0.15. In the second correlogram, the r star subscript k value is 1.0 for k equals 0, the next 8 values increase from negative 0.5 to negative 0.05. The remaining values are 0.1, negative 0.02, negative 0.2, negative 0.15, 0.1, 0.05, and negative 0.02, respectively. All data are estimated.

The time series plot reveals that adjacent observations in the time series tend to be on opposite sides of the population mean, which is consistent with the sample lag 1 autocorrelation [latex]r_1 = -0.524[/latex]. The sample autocorrelation function values for lags 2 through 15 do not differ from zero by a statistically significant amount. So the sample autocorrelation cuts off after lag 1 as expected. Furthermore, the sample partial autocorrelation tails out as expected. The arima.sim function can also be used to generate a realization of an MA(1) time series model using fewer keystrokes. The R single statement

generates 100 values from a shifted MA(1) model with [latex]\theta = -0.9[/latex], [latex]\mu = 20[/latex] and [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. The default probability distribution for the white noise terms is Gaussian white noise.

Having established the probabilistic properties of the MA(1) model, we now turn to statistical topics, beginning with the estimation of the model parameters.

Parameter Estimation

There are several techniques for estimating the parameters in an MA(1) model; as was the case for the autoregressive models, we look at the method of moments, least squares, and maximum likelihood estimation techniques separately. Parameter estimation is more difficult for moving average models, as numerical methods are typically required to calculate the parameter estimates.

Approach 1: Method of moments. We begin with the shifted MA(1) model from Definition 9.4:

$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1} . \end{array}$

We want to estimate the three unknown parameters μ, θ, and [latex]\sigma _ Z ^ {\, 2}[/latex] from an observed time series [latex]\left\{ x_t \right\}[/latex]. In the case of the shifted MA(1) model, we match the population and sample (a) first-order moments, (b) second-order moments, and (c) lag 1 autocorrelation. These will be written with upper case values X_t although these will be replaced with numeric values x_t for a particular observed time series. Placing the population moments of the left-hand side of the equation and the associated sample moments on the right-hand side of the equation results in three equations in three unknowns:

$\begin{array}{l} E [X_{t}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t} \\ E [X_{t}^{2}] & = \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ ρ (1) & = r_{1} \end{array}$

or

$\begin{array}{r} μ & = & \bar{X} \\ V [X_{t}] + E {[X_{t}]}^{2} = γ (0) + μ^{2} = (1 + θ^{2}) σ_{Z}^{2} + μ^{2} & = & \frac{1}{n} \sum_{t = 1}^{n} X_{t}^{2} \\ \frac{θ}{1 + θ^{2}} & = & r_{1} . \end{array}$

The third equation is a quadratic equation in θ:

$\begin{array}{l} r_{1} θ^{2} - θ + r_{1} = 0, \end{array}$

which corresponds to the parabolas in Figure 9.36, except that [latex]\rho(1)[/latex] is replaced by r₁. Using the quadratic formula, the product of the two roots

$\begin{array}{l} θ = \frac{1 \pm \sqrt{1 - 4 r_{1}^{2}}}{2 r_{1}} \end{array}$

equals 1, so the root that falls within the invertibility region [latex]-1 < \hat \theta < 1[/latex] should be chosen. Some algebra shows that this can be done by always selecting the minus in the ± portion of the formula. Once the point estimator [latex]\hat \theta[/latex] has been chosen, the first two equations can be solved as

$\begin{array}{l} \hat{μ} = \bar{X} and {\hat{σ}}_{Z}^{2} = \frac{(1 / n) \sum_{t = 1}^{n} X_{t}^{2} - {\hat{μ}}^{2}}{1 + {\hat{θ}}^{2}} . \end{array}$

It appears that we have closed-form solutions to the method of moments estimators, but there is a subtle wrinkle in this derivation. Because of random sampling variability there is a chance that the lag 1 sample autocorrelation r₁ might be greater than [latex]1 / 2[/latex] or less than [latex]-1 / 2[/latex], even if the population time series model truly is a shifted MA(1) model satisfying the invertibility criterion [latex]-1 < \theta < 1[/latex]. In this case the quadratic formula yields complex roots. So the method of moments parameter estimation approach is recommended for initial parameter estimates only if the constraint [latex]|r_1| < 1 / 2[/latex] stated in the result that follows is met.

Thus, the method of moments point estimators in Theorem 9.30 should only be used for determining initial estimators of μ, θ, and [latex]\sigma _ Z ^ {\, 2}[/latex] from [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] when the following criteria are met:

the number of observations in the time series is greater than about [latex]n = 60[/latex] or [latex]n = 70[/latex],
the time series appears to be stationary,
the sample autocorrelation function has a single statistically significant spike at lag 1,
the sample partial autocorrelation function tails out, and
[latex]-1 / 2 < r_1 < 1 / 2[/latex].

The method of moments estimators are generally used to find initial point estimates for the parameters in an MA(1) model, which are subsequently used in an iterative scheme to find the least squares or maximum likelihood estimators. This will be illustrated next on a time series consisting of chemical yields.

Example 9.35 Consider the time series consisting of the production record of [latex]n = 210[/latex] consecutive yield values in a chemical production process in Table 9.14. The entries are read row wise. Determine an appropriate model for this time series and find the method of moments initial estimates for the parameters.

Table 9.14: Production record of [latex]n = 210[/latex] consecutive chemical yields.
85.5	81.7	80.6	84.7	88.2	84.9	81.8	84.9	85.2	81.9	89.4	79.0	81.4	84.8
85.9	88.0	80.3	82.6	83.5	80.2	85.2	87.2	83.5	84.3	82.9	84.7	82.9	81.5
83.4	87.7	81.8	79.6	85.8	77.9	89.7	85.4	86.3	80.7	83.8	90.5	84.5	82.4
86.7	83.0	81.8	89.3	79.3	82.7	88.0	79.6	87.8	83.6	79.5	83.3	88.4	86.6
84.6	79.7	86.0	84.2	83.0	84.8	83.6	81.8	85.9	88.2	83.5	87.2	83.7	87.3
83.0	90.5	80.7	83.1	86.5	90.0	77.5	84.7	84.6	87.2	80.5	86.1	82.6	85.4
84.7	82.8	81.9	83.6	86.8	84.0	84.2	82.8	83.0	82.0	84.7	84.4	88.9	82.4
83.0	85.0	82.2	81.6	86.2	85.4	82.1	81.4	85.0	85.8	84.2	83.5	86.5	85.0
80.4	85.7	86.7	86.7	82.3	86.4	82.5	82.0	79.5	86.7	80.5	91.7	81.6	83.9
85.6	84.8	78.4	89.9	85.0	86.2	83.0	85.4	84.4	84.5	86.2	85.6	83.2	85.7
83.5	80.1	82.2	88.6	82.0	85.0	85.2	85.3	84.3	82.3	89.7	84.8	83.1	80.6
87.4	86.8	83.5	86.2	84.1	82.3	84.8	86.6	83.5	78.1	88.8	81.9	83.3	80.0
87.2	83.3	86.6	79.5	84.1	82.2	90.8	86.5	79.7	81.0	87.2	81.6	84.4	84.4
82.2	88.9	80.9	85.1	87.1	84.0	76.5	82.7	85.1	83.3	90.4	81.0	80.3	79.8
89.0	83.7	80.9	87.3	81.1	85.6	86.6	80.0	86.6	83.3	83.1	82.3	86.7	80.2

The first step is to plot the time series, sample autocorrelation function, and sample partial autocorrelation function. The following R code uses the plot.ts, acf, and pacf functions to produce the graphs of the time series and correlogram given in Figure 9.42. The raw time series values are stored in a file named yields.dat.

Figure 9.42: Time series plot, r_k, and [latex]r^*_k[/latex] for [latex]n = 210[/latex] chemical yields.

Long Description for Figure 9.42

In the time series plot, the horizontal axis t ranges from 1 to 210. The vertical axis x subscript t ranges from 76 to 92 in increments of 2 units. A horizontal line is drawn at x subscript t equals 84.1. A cluster of chemical yields follows a spike pattern and ranges between 77 and 91. Most of the values range between 80 and 88 with the highest value being 91 and the lowest value being 77. In both correlograms, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis in the first correlogram represents r subscript k while the second correlogram represents r star subscript k. Both range from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Dashed horizontal lines are at negative 0.15 and 0.15. In the first correlogram, the r subscript k values are 1.0, negative 0.25, and negative 0.2 for k equals 0, 1, and 2. The remaining values range between negative 0.1 and 0.1. In the second correlogram, the r star subscript k values are 1.0, negative 0.5, negative 0.2, and negative 0.15 for k equals 0, 1, 2, and 3. The remaining values alternate between positive and negative, ranging between negative 0.1 and 0.1. All data are estimated.

The time series appears to be stationary. The time series frequently jumps from one side of the sample mean [latex]\bar x = 84.1[/latex] to the other, indicating a negative sample correlation between adjacent values in the time series. The sample lag 1 autocorrelation [latex]r_1 = -0.289[/latex] falls outside of the 95% confidence bounds, so we can conclude that there is a single statistically significant spike at lag 1. There are marginally statistically significant spikes at lags 2 and 6, which we will attribute to random sampling variability. The sample partial autocorrelation function has statistically significant spikes at lags 1, 2, and 3 which are negative and decrease in magnitude. Since [latex]-1 / 2 < r_1 < 1 / 2[/latex], the time series plot and the shapes of r_k and [latex]r^*_k[/latex] indicate that the shifted MA(1) model

$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1} \end{array}$

with a negative value of θ might be appropriate. The R statements that follow are used to estimate the model parameters using Theorem 9.30.

This code yields the following method of moments point estimators for the three parameters:

$\begin{array}{l} \hat{μ} = 84.1 \hat{θ} = - 0.318 {\hat{σ}}^{2} = 7.50 . \end{array}$

This value of [latex]\hat \theta[/latex] falls within the invertibility region [latex]-1 < \theta < 1[/latex] for the shifted MA(1) time series model.

Approach 2: Least squares. Consider the shifted stationary MA(1) model

$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1} \end{array}$

from Theorem 9.29. For least squares estimation, we first establish the sum of squares S as a function of the parameters μ and θ. Numerical methods are required to determine the least squares estimators of μ and θ. Once these least squares estimators have been determined, the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex] will be estimated.

Solving the shifted MA(1) model defining formula for Z_t results in

$\begin{array}{l} Z_{t} = X_{t} - μ - θ Z_{t - 1} . \end{array}$

Seeding this recursive formula with [latex]Z_0 = 0[/latex] gives the residuals

$\begin{array}{l} Z_{1} & = X_{1} - μ \\ Z_{2} & = X_{2} - μ - θ Z_{1} \\ Z_{3} & = X_{3} - μ - θ Z_{2} \\ ⋮ & = ⋮ \\ Z_{n} & = X_{n} - μ - θ Z_{n - 1} . \end{array}$

Thus, the sum of squared errors is

$\begin{array}{l} S = \sum_{t = 1}^{n} Z_{t}^{2} = (X_{1} - μ)^{2} + \sum_{t = 2}^{n} {(X_{t} - μ - θ Z_{t - 1})}^{2} . \end{array}$

Numerical methods are required to find the parameter estimates. This will be illustrated next for the time series of chemical yields.

Example 9.36 Find the least squares estimators for the time series of chemical yields given in Example 9.35.

The R code below uses the optim function to perform a search to find the least squares parameter estimates for μ and θ. The first statement reads the time series values into the vector x. The second statement uses the length function to calculate the number of observations in the time series. The third statement defines the function s, which calculates the sum of squares. The last statement calls the optim function with the method of moments estimators as initial parameter estimates as its first argument. The optim function minimizes the function in its second argument by default.

The optim function is being called in this case to perform a two-dimensional search without any derivative information being supplied. In order to visualize what the optim function is up against, picture yourself standing blindfolded on the side a mountain. The position where you are standing corresponds to the initial estimates for the parameters. In order to find the least squares parameter estimates, you want to take steps that lead you to the bottom of the valley, where the height corresponds to the sum of squares of the residuals. For maximum likelihood estimation, you want to take steps that lead you to the peak of the mountain, where the height corresponds to the likelihood function. But you are not given any gradient information from the function about the best direction to proceed for your next step. Regardless of the argument selected in the method argument, the internal algorithm in the optim function converges to roughly the same parameter estimates, as shown in Table 9.15. All five of the methods round to [latex]\hat \mu = 84.13[/latex] and [latex]\hat \theta = -0.483[/latex]. The columns in Table 9.15 give the method, the number of calls to the function, the number of calls to evaluate the gradient, the least squares parameter estimates of μ, the least squares parameter estimates of θ, and the associated sums of squares S.

Table 9.15: MA(1) least squares parameter estimates for the chemical yields.
Method	Function	Gradient	[latex]\hat \mu[/latex]	[latex]\hat \theta[/latex]	S
Nelder--Mead	67	NA	84.12932	−0.48259	1484.989
BFGS	22	6	84.12942	−0.48261	1484.989
CG	180	23	84.12942	−0.48261	1484.989
L--BFGS--B	17	17	84.12942	−0.48261	1484.989
SANN	10000	NA	84.13053	−0.48269	1484.990

There is a large difference between the method of moments estimator of θ, which was [latex]\hat \theta = -0.318[/latex] from Example 9.35, and the least squares estimator of θ, which is [latex]\hat \theta = -0.483[/latex] from Table 9.15. This is an instance of why the method of moments estimators are only used for initial estimates for iterative schemes for finding least squares estimates or maximum likelihood estimates. The population variance of the white noise can be estimated using the method of moments formula as

$\begin{array}{l} {\hat{σ}}_{Z}^{2} = \frac{(1 / n) \sum_{t = 1}^{n} X_{t}^{2} - {\hat{μ}}^{2}}{1 + {\hat{θ}}^{2}} = 5.61 . \end{array}$

Approach 3: Maximum likelihood estimation. We use the arima function to do the heavy lifting with respect to the estimation of the parameters in the MA(1) time series model via maximum likelihood. In addition to the point estimates, confidence intervals are based on the asymptotic distribution of the maximum likelihood estimator [latex]\hat \theta[/latex], which for large values of n is

$\begin{array}{l} V [\hat{θ}] ≅ \frac{1 - θ^{2}}{n} . \end{array}$

So when the parameter θ is estimated by maximum likelihood from a time series, an asymptotically exact [latex]{100(1 - \alpha)}\%[/latex] confidence interval for θ is given in the result below. It is based on the consistency and the asymptotic normality of maximum likelihood estimators, which in this case implies that

$\begin{array}{l} \hat{θ} \overset{a}{\sim} N (θ, \frac{1 - θ^{2}}{n}) . \end{array}$

Replacing θ by its maximum likelihood estimator in the variance yields the following result.

The formula for the confidence interval from Theorem 9.31 will be illustrated for the chemical yield data from the previous two examples.

Example 9.37 Find the maximum likelihood estimators for μ, θ, and [latex]\sigma _ Z ^ {\, 2}[/latex] for fitting a shifted MA(1) time series model to the time series of chemical yields from Example 9.35. Give a 95% confidence interval for θ and μ.

The following R code uses the arima function to calculate the maximum likelihood estimators for μ, θ, and [latex]\sigma _ Z ^ {\, 2}[/latex], estimate the variance–covariance matrix of the standard errors, and display the standard errors of the estimators.

The resulting fitted MA(1) time series model to significant digits via maximum likelihood estimation is

$\begin{array}{l} X_{t} = \hat{μ} + Z_{t} + \hat{θ} Z_{t - 1} \end{array}$

or

$\begin{array}{l} X_{t} = 84.13 + & Z_{t} - 0.480 Z_{t - 1}, \\ (0.0958) & (0.0667) \end{array}$

where Z_t is white noise with estimated population variance [latex]\hat \sigma _ Z ^ {\, 2} = 7.071[/latex]. The numbers in parentheses just below the parameter estimates are the estimated standard errors of the associated parameter estimates. Using the standard errors from arima, the associated approximate 95% confidence intervals are

$\begin{array}{l} - 0.611 & < θ < - 0.349 \\ 83.94 & < μ < 84.32 . \end{array}$

The 95% confidence interval for θ using Theorem 9.31 is slightly narrower than that produced by the arima function: [latex]-0.599 < \theta < -0.362[/latex].

The parameter estimates using the method of moments, least squares, and maximum likelihood estimation from the previous three examples are summarized in Table 9.16. Notice that the least squares and maximum likelihood estimates of θ differ significantly from the associated method of moments estimator of θ.

Table 9.16: Point estimators for the MA(1) parameters for the [latex]n = 210[/latex] chemical yields.
Method	[latex]\hat \mu[/latex]	[latex]\hat \theta[/latex]	[latex]\hat \sigma _ Z ^ {\, 2}[/latex]
Method of moments	84.1	−0.318	7.50
Ordinary least squares	84.1	−0.483	5.61
Maximum likelihood estimation	84.1	−0.480	7.07

In the interest of brevity, we leave the model assessment, model selection, and forecasting steps of the process for the chemical yields time series as an exercise. The derivations for these procedures follow along the same lines as those for the autoregressive models from the previous section.

This subsection has introduced the MA(1) time series model. The key results for an MA(1) model are listed below.

The standard MA(1) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1} and X_{t} = θ (B) Z_{t}, \end{array}$

where [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex], [latex]\sigma _ Z ^ {\, 2} > 0[/latex], and [latex]\theta(B) = 1 + \theta B[/latex] (Definition 9.4).
The shifted MA(1) model can be written algebraically and with the backshift operator B as (Theorem 9.29)
$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1} and X_{t} = μ + θ (B) Z_{t} . \end{array}$
The MA(1) model is stationary for all finite real-valued parameters θ and [latex]\sigma _ Z ^ {\, 2}[/latex] (Theorem 9.25).
The MA(1) model is invertible when [latex]-1 < \theta < 1[/latex] (Theorem 9.27).
The MA(1) model can be written as an AR(∞) model when [latex]-1 < \theta < 1[/latex] as (Theorem 9.26)
$\begin{array}{l} X_{t} = Z_{t} + θ X_{t - 1} - θ^{2} X_{t - 2} + θ^{3} X_{t - 3} - \dots \end{array}$
The MA(1) model lag 1 population autocorrelation is [latex]\rho(1) = \theta / \left( 1 + \theta ^ 2 \right)[/latex], and [latex]\rho(k) = 0[/latex] for [latex]k = 2, \, 3, \, \ldots[/latex] (Theorem 9.25). The lag 1 population autocorrelation satisfies the inequality [latex]-1 / 2 \le \rho(1) \le 1 / 2[/latex] (Figure 9.37).
The MA(1) lag k population partial autocorrelation for [latex]-1 < \theta < 1[/latex] is
$\begin{array}{l} ρ^{*} (k) = \frac{(- 1)^{k + 1} θ^{k} (1 - θ^{2})}{1 - θ^{2 (k + 1)}} \end{array}$

for [latex]k = 1, \, 2, \, \ldots[/latex] (Theorem 9.28).
A time series of [latex]n + 1[/latex] white noise values [latex]Z_0, \, Z_1, \, Z_2 , \, \ldots, \, Z_n[/latex] can be converted to n simulated observations [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex] by using the MA(1) defining formula [latex]X_t = Z_t + \theta Z_{t - 1}[/latex].
The parameters in the MA(1) model can be estimated via the method of moments, least squares estimation, and maximum likelihood estimation.

9.2.2 The MA(2) Model

The additional term in the MA(2) model gives it increased flexibility over the associated MA(1) model.

An observed value in the time series, X_t, is given by the current white noise term, plus the parameter θ₁ multiplied by the white noise term from one time period ago, plus the parameter θ₂ multiplied by the white noise term from two time periods ago. So there are three parameters that define an MA(2) model: the coefficients θ₁ and θ₂, and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. As was the case of the MA(1) model, some authors use a - rather than a + between three terms on the right-hand side of the model.

The probabilistic properties and statistical methods associated with an MA(2) model are straightforward generalizations of those properties and methods for the MA(1) model. Rather than deriving these results from first principles, we simply state several of these results without proof and then conduct a Monte Carlo simulation experiment which highlights issues that arise in model selection.

- The standard MA(2) model can be written algebraically and with the backshift operator B as
  $\begin{array}{l} X_{t} = Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} and X_{t} = θ (B) Z_{t}, \end{array}$
  
  where [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex], [latex]\sigma _ Z ^ {\, 2} > 0[/latex], and [latex]\theta(B) = 1 + \theta_1 B + \theta_2 B ^ 2[/latex].
- The shifted MA(2) model can be written algebraically and with the backshift operator B as
  $\begin{array}{l} X_{t} = μ + Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} and X_{t} = μ + θ (B) Z_{t} . \end{array}$
- MA(2) models are stationary for all finite, real-valued parameters μ, θ₁, θ₂, and [latex]\sigma _ Z ^ {\, 2}[/latex].
- Just as the stationarity region for the AR(2) model has a triangular shape, the invertibility region for the MA(2) model also has a triangular shape defined by the three constraints
  $\begin{array}{l} θ_{1} + θ_{2} > - 1, θ_{2} - θ_{1} > - 1, θ_{2} < 1. \end{array}$
  
  This region is an upside-down version of the region for an AR(2) time series model depicted in Figure 9.12. In other words, the triangles are equivalent when reflected vertically about the origin.
- The population autocovariance function is
  $\begin{array}{l} γ (k) = {\begin{cases} (1 + θ_{1}^{2} + θ_{2}^{2}) σ_{Z}^{2} & k = 0 \\ (θ_{1} + θ_{1} θ_{2}) σ_{Z}^{2} & k = 1 \\ θ_{2} σ_{Z}^{2} & k = 2 \\ 0 & k = 3, 4, \dots \end{cases} \end{array}$
- The population autocorrelation function is
  $\begin{array}{l} ρ (k) = {\begin{cases} 1 & k = 0 \\ (θ_{1} + θ_{1} θ_{2}) / (1 + θ_{1}^{2} + θ_{2}^{2}) & k = 1 \\ θ_{2} / (1 + θ_{1}^{2} + θ_{2}^{2}) & k = 2 \\ 0 & k = 3, 4, \dots \end{cases} \end{array}$
- The population partial autocorrelation function of an MA(2) model can be determined by using the defining formula from Definition 7.8.
- A simulated realization [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex] of a time series from an MA(2) model is generated by the following algorithm.

- The parameters of an MA(2) time series model can be estimated by the method of moments, least squares, and maximum likelihood estimation. As shown in the next example, the arima function can be used in R to calculate these parameter estimates.

The previous subsections have analyzed n observed values of a time series in order to determine an AR(p) or MA(q) model which adequately describes the probabilistic mechanism governing the observed time series. Instead of following this same pattern, we instead conduct a Monte Carlo simulation experiment that highlights weaknesses in the model selection process.

Example 9.38 Consider a standard (unshifted) MA(2) model with parameters [latex]\theta_1 = 0.4[/latex], [latex]\theta_2 = 0.6[/latex], and [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. For a realization of [latex]n = 60[/latex] observations from this time series fitted by maximum likelihood estimation, use Monte Carlo simulation to estimate the probability that the correct model is identified if the AIC criterion is used to determine the correct model.

This Monte Carlo simulation experiment answers an important question in time series analysis. If we have just a single realization of a time series (this is often the case in practice) which is governed by an approximately ARMA model, what is the probability that we correctly identify the p and q values associated with the ARMA(p, q) time series model which generated the observations?

The MA(2) model is the population time series model in this particular simulation experiment. The choice of parameters [latex]\theta_1 = 0.4[/latex] and [latex]\theta_2 = 0.6[/latex] falls in the invertibility region, so this particular population MA(2) model is both stationary and invertible. Furthermore, the parameters [latex]\theta_1 = 0.4[/latex] and [latex]\theta_2 = 0.6[/latex] have been chosen so that the population autocorrelation function is

$\begin{array}{l} ρ (k) = {\begin{cases} 1 & k = 0 \\ (0.4 + 0.4 \cdot 0.6) / (1 + {0.4}^{2} + {0.6}^{2}) & k = 1 \\ 0.6 / (1 + {0.4}^{2} + {0.6}^{2}) & k = 2 \\ 0 & k = 3, 4, \dots \end{cases} \end{array}$

or

$\begin{array}{l} ρ (k) = {\begin{cases} 1 & k = 0 \\ 0.42 & k = 1 \\ 0.39 & k = 2 \\ 0 & k = 3, 4, \dots . \end{cases} \end{array}$

As anticipated, the population autocorrelation cuts off after lag 2. So we expect that the first two values in the sample autocorrelation function computed from a realization of this time series model, r₁ and r₂, will be statistically significant, and the others will fall between the confidence bounds [latex]\pm 1.96 / \sqrt{60} = \pm 0.25[/latex]. The coefficients θ₁ and θ₂ have been chosen so that the first two values in the population autocorrelation function, [latex]\rho(1) = 0.42[/latex] and [latex]\rho(2) = 0.39[/latex], both fall outside of the confidence bounds [latex]\pm 1.96 / \sqrt{60} = \pm 0.25[/latex] associated with the sample autocorrelation function. This choice of parameters has been made to give the ARMA modeling procedure a good chance of correctly identifying the underlying population MA(2) time series model.

The R code below generates realizations of 1000 time series of length [latex]n = 60[/latex] from an MA(2) model with [latex]\theta_1 = 0.4[/latex], [latex]\theta_2 = 0.6[/latex], and [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. The arima.sim function is used to generate each realization, and the simulated values are placed in the vector x. The two inner nested for loops fit all ARMA(p, q) models to the simulated values, for [latex]p = 0, \, 1, \, 2, \, \ldots , \, 5[/latex] and [latex]q = 0, \, 1, \, 2, \, \ldots , \, 5[/latex] using the arima function. The AIC for each of the fitted models are stored in the a matrix. Finally, the which.min function is used to determine which of the [latex]6 \cdot 6 = 36[/latex] models has the lowest AIC value.

Table 9.17 contains the estimated probabilities of the selection of the various models expressed as percents. The good news is that the MA(2) model is the one that is chosen most often of the 36 models based on the AIC criterion. The associated bad news is that the MA(2) model is chosen less than half of the time. So in the practical case of analyzing a single time series of [latex]n = 60[/latex] values, there is a better than 0.5 probability that this procedure will identify the wrong population time series model. This illustrates the unwelcome effect of random sampling variability in model selection.

Table 9.17: Estimated probabilities of selection based on AIC criterion.
	[latex]q = 0[/latex]	[latex]q = 1[/latex]	[latex]q = 2[/latex]	[latex]q = 3[/latex]	[latex]q = 4[/latex]	[latex]q = 5[/latex]
[latex]p = 0[/latex]	0.2%	0.0%	40.1%	4.9%	2.4%	1.9%
[latex]p = 1[/latex]	0.7%	0.0%	3.1%	6.0%	1.4%	1.6%
[latex]p = 2[/latex]	0.7%	0.9%	3.7%	1.3%	5.3%	1.5%
[latex]p = 3[/latex]	3.2%	1.6%	2.6%	0.7%	1.9%	2.2%
[latex]p = 4[/latex]	0.9%	0.9%	2.0%	1.4%	1.5%	1.1%
[latex]p = 5[/latex]	0.6%	0.6%	0.8%	0.7%	1.0%	0.6%

Bear in mind that the estimated probabilities given in Table 9.17 only apply to the use of the AIC criterion and the parameter settings [latex]\theta_1 = 0.4[/latex], [latex]\theta_2 = 0.6[/latex], [latex]\sigma _ Z ^ {\, 2} = 1[/latex], and [latex]n = 60[/latex]. Changing any one of these parameters will alter the probabilities. The purpose of this example is to highlight the pitfalls associated with fitting a time series model to a single realization of n time series values. The probability of an incorrect selection is high, and this is an argument for collecting a longer time series when possible. In addition, if a time series is collected periodically (for example, n values collected annually), then the fits to various realizations should be compared.

To summarize the models considered so far in this chapter, the AR(1), AR(2), MA(1), and MA(2) models are parsimonious in the sense that they have significant explanatory power with few parameters. By deriving the population autocorrelation function and partial autocorrelation function for these models, we now possess an inventory of possible shapes that guide us toward one particular time series model or another. Figure 9.43 gives examples of these shapes for various values of the parameters.

A table presents the characteristic shapes of autoregressive models 1 and 2, and the moving average models 1 and 2 for rho of k and rho star of k. — Figure 9.43: Characteristic shapes of [latex]\rho(k)[/latex] and [latex]\rho ^ *(k)[/latex] for AR(1), AR(2), MA(1), and MA(2) models.

Long Description for Figure 9.43

The first column represents autoregressive models A R 1 and A R 2 while the second column represents moving average M A 1 and M A 2. Each column exhibits a correlogram for population autocorrelation function rho of k, and a partial autocorrelation function rho star of k. In each correlogram, the horizontal axis is k, and the vertical axis ranges from negative 1 to 1 in increments of 1 unit. A horizontal line is drawn at 0. For the A R 1 model, if phi is greater than 0, the rho of k values decreases from 1 to 0.2. The rho star of k values are 1 and 0.8 for the first two k values. For the A R 1 model, if phi is less than 0, the rho of k values alternate signs and follow a damped sinusoidal fashion with decreasing magnitude. The rho star of k values are 1 and negative 0.8 for the first two k values. For the A R 2 model, there are four cases. Case 2. If phi 1 is greater than 0 and phi 2 is greater than 0, the rho of k values decreases from 1 to 0.2. The rho star of k values decreases from 1 to 0.3 for the first three k values. Case 2. If phi 1 is less than 0 and phi is greater than 0, the rho of k values alternate signs and follows a damped sinusoidal pattern with decreasing magnitude. The rho star of k values are 1, negative 0.8, and 0.5, respectively for the first three k values. Case 3. If phi 1 is less than 0 and phi 2 is less than 0, rho of k values alternate signs with different magnitudes. The rho star of k values are 1, negative 1, and negative 0.8 for the first three k values. Case 4. If phi 1 is greater than 0 and phi 2 is less than 0, the rho of k values follows a damped sinusoidal fashion. The rho star of k values are 1, 0.2, and negative 0.6 for the first three k values. For the moving average M A 1 model, if theta is greater than 0, the rho of k values are 1 and 0.5 for the first 2 k values. The rho star of k values follows a damped sinusoidal fashion with decreasing magnitude. If theta is less than 0, the rho of k values are 1 and negative 0.5 for the first 2 k values. The rho star of k values follows a damped sinusoidal fashion with decreasing magnitude. For the moving average M A 2 model, there are four cases. Case 1: theta 1 greater than 0, theta 2 greater than 0. Case 2. theta 1 less than 0 and theta 2 greater than 0. Case 3. theta 1 less than 0 and theta 2 less than 0. Case 4. theta 1 greater than 0 and theta 2 less than 0. In all four cases, the rho of k values are 1, 0.2, and 0.15 for the first three k values. In all four cases, the rho star of k values is 1, 0.2, 0.1, and negative 0.05 for the first four k values. All data are estimated.

9.2.3 The MA(q) Model

The MA(1) and MA(2) models introduced in the previous two subsections generalize to the MA(q) model defined in this section.

An observed value in the time series, X_t, is given by the current white noise term plus a linear combination of the q previous white noise terms. So there are [latex]q + 1[/latex] parameters that define an MA(q) model: the coefficients [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex], and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex]. As was the case of the MA(1) and MA(2) models, some authors use a - rather than a + between terms on the right-hand side of the model.

The probabilistic properties and statistical methods associated with an MA(q) model are determined in the usual fashion. Here are several of these results stated without proof.

The population mean and variance of X_t are easily calculated by taking the expected value and the population variance of both sides of the equation given in Definition 9.6:
$\begin{array}{l} E [X_{t}] = E [Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots + θ_{q} Z_{t - q}] = 0 \end{array}$

and

$\begin{array}{l} V [X_{t}] = V [Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots + θ_{q} Z_{t - q}] = (1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{q}^{2}) σ_{Z}^{2} . \end{array}$
The standard MA(q) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots + θ_{q} Z_{t - q} and X_{t} = θ (B) Z_{t}, \end{array}$

where [latex]Z_t \sim WN \left( 0, \, \sigma _Z ^ {\, 2} \right)[/latex], [latex]\sigma _ Z ^ {\, 2} > 0[/latex], and [latex]\theta(B) = 1 + \theta_1 B + \theta_2 B ^ 2 + \cdots + \theta_q B ^ q[/latex].
The shifted MA(q) model can be written algebraically and with the backshift operator B as
$\begin{array}{l} X_{t} = μ + Z_{t} + θ_{1} Z_{t - 1} + θ_{2} Z_{t - 2} + \dots + θ_{q} Z_{t - q} and X_{t} = μ + θ (B) Z_{t} . \end{array}$
MA(q) models are stationary for all finite, real-valued parameters μ, [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex].
MA(q) models are invertible when the q roots of the characteristic equation
$\begin{array}{l} θ (B) = 1 + θ_{1} B + θ_{2} B^{2} + \dots + θ_{q} B^{q} = 0 \end{array}$

all lie outside of the unit circle in the complex plane.
The population autocovariance function is
$\begin{array}{l} γ (k) = {\begin{cases} (1 + θ_{1}^{2} + θ_{2}^{2} + \dots + θ_{q}^{2}) σ_{Z}^{2} & k = 0 \\ (θ_{1} + θ_{1} θ_{2} + θ_{2} θ_{3} + \dots + θ_{q - 1} θ_{q}) σ_{Z}^{2} & k = 1 \\ (θ_{2} + θ_{1} θ_{3} + θ_{2} θ_{4} + \dots + θ_{q - 2} θ_{q}) σ_{Z}^{2} & k = 2 \\ ⋮ & ⋮ \\ θ_{q} σ_{Z}^{2} & k = q \\ 0 & k = q, q + 1, \dots . \end{cases} \end{array}$

This can be written more compactly as

$\begin{array}{l} γ (k) = {\begin{cases} (θ_{k} + θ_{1} θ_{k + 1} + θ_{2} θ_{k + 2} + \dots + θ_{q - k} θ_{q}) σ_{Z}^{2} & k = 0, 1, 2, \dots, q \\ 0 & k = q, q + 1, \dots, \end{cases} \end{array}$

where [latex]\theta_0 = 1[/latex].
The population autocorrelation function is
$\begin{array}{l} ρ (k) = {\begin{cases} (θ_{k} + θ_{1} θ_{k + 1} + θ_{2} θ_{k + 2} + \dots + θ_{q - k} θ_{q}) / (1 + θ_{1}^{2} + \dots + θ_{q}^{2}) & k = 0, 1, \dots, q \\ 0 & k = q, q + 1, \dots . \end{cases} \end{array}$

As expected, the population autocorrelation function cuts off after lag q.
The population partial autocorrelation function of an MA(q) model can be determined by using the defining formula from Definition 7.8.
A simulated realization [latex]X_1, \, X_2, \, \ldots, \, X_n[/latex] of a time series from an MA(q) model is generated by the following algorithm.
The parameters of an MA(q) time series model can be estimated by the method of moments, least squares, and maximum likelihood estimation. The arima function can be used in R to calculate these parameter estimates for particular values of a time series [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex].

Table 9.18 shows some of the symmetry between autoregressive and moving average models. When one aspect of the time series model is easier to derive for one of the models, it is often more difficult to derive for the analogous time series model. The population autocorrelation function for an MA(q) model is closed form, for example, but the population autocorrelation function for an AR(p) model requires solving the Yule–Walker equations. As a second example on the statistical side, the least squares estimators for the AR(1) model are closed form, but the least squares estimators for the MA(1) model require numerical methods.

Table 9.18: AR(p) versus MA(q) models.
	Autoregressive: AR(p)	Moving Average: MA(q)
Model definition	[latex]\phi(B) X_t = Z_t[/latex]	[latex]X_t = \theta(B) Z_t[/latex]
Characteristic polynomial	[latex]\phi(B) = 1 - \phi_1 B - \phi_2 B ^ 2 - \cdots - \phi_p B ^ p[/latex]	[latex]\theta(B) = 1 + \theta_1 B + \theta_2 B ^ 2 + \cdots + \theta_q B ^ q[/latex]
Stationarity condition	[latex]\phi(B) = 0[/latex] roots outside of unit circle	always stationary
Invertibility condition	always invertible	[latex]\theta(B) = 0[/latex] roots outside of unit circle
Equivalent model	MA(∞) when stationary	AR(∞) when invertible
General linear model π weights	finite series	infinite series
General linear model [latex]\psi[/latex] weights	infinite series	finite series
Shape of [latex]\rho(k)[/latex]	tails out	cuts off after lag q
Shape of [latex]\rho^*(k)[/latex]	cuts off after lag p	tails out
Simulating a realization	warm up period needed	no warm up period needed

9.3 ARMA([latex]p, \, q[/latex]) Models

The autoregressive and moving average models outlined in the previous two sections often prove to be inadequate time series models in a particular application. Occasions arise in which the best model for a time series involves both autoregressive and moving average terms. Recall from Definition 8.4 that an ARMA(p, q) time series model with p autoregressive terms and q moving average terms is

$$
X_t = \rlap{\overbrace{\phantom{
\phi_1 X_{t - 1} +
\phi_2 X_{t - 2} +
\cdots +
\phi_p X_{t - p} +
\phantom{Z_t}}}^{\hbox{autoregressive portion}}}
\phi_1 X_{t - 1} +
\phi_2 X_{t - 2} +
\cdots +
\phi_p X_{t - p} +
\underbrace{
Z_t +
\theta_1 Z_{t - 1} +
\theta_2 Z_{t - 2} +
\cdots +
\theta_q Z_{t - q}}_{\hbox{moving average portion}},
$$

where [latex]\left\{ X_t \right\}[/latex] is the time series of interest, [latex]\left\{ Z_t \right\}[/latex] is a time series of white noise, [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] are real-valued parameters associated with the AR portion of the model, and [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex] are real-valued parameters associated with the MA portion of the model. The ARMA(p, q) model can be written more compactly as

$\begin{array}{l} ϕ (B) X_{t} = θ (B) Z_{t}, \end{array}$

where [latex]\phi(B)[/latex] and [latex]\theta(B)[/latex] are the characteristic polynomials defined by

$\begin{array}{l} ϕ (B) = 1 - ϕ_{1} B - ϕ_{2} B^{2} - \dots - ϕ_{p} B^{p} \end{array}$

and

$\begin{array}{l} θ (B) = 1 + θ_{1} B + θ_{2} B^{2} + \dots + θ_{q} B^{q} . \end{array}$

This model on its own is of little practical use because most real-world time series are not centered around [latex]E[X_t] = 0[/latex]. Using the compact notation for the ARMA(p, q) time series model, a shift parameter μ is easily added:

$\begin{array}{l} ϕ (B) (X_{t} - μ) = θ (B) Z_{t} . \end{array}$

So there are [latex]p + q + 2[/latex] parameters that define a shifted ARMA(p, q) time series model: the p autoregressive coefficients [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex], the q moving average coefficients [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_q[/latex], the shift parameter μ, and the population variance of the white noise [latex]\sigma _ Z ^ {\, 2}[/latex].

Recall from Table 9.7 in Example 9.20 that the ARMA(1, 1) model fitted by maximum likelihood estimation gave a slightly lower AIC than the associated AR(2) model when applied to the Lake Huron level time series. This section will consist of one long example that concerns the fitting and assessing this ARMA(1, 1) model to determine whether it is an adequate model for the Lake Huron levels. Rather than deriving all of the probabilistic properties and statistical methods for the ARMA(1, 1) model, the arima function in R will be used to perform the fitting, leaving the details to the reader. By default, the arima function (a) ignores external regressor variables, (b) ignores seasonal variation, (c) includes a shift parameter μ, (d) uses the same parameterization for the ARMA(p, q) process as that used in this text, (e) transforms the AR parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] if necessary so that they stay in the stationarity region, and (f) uses a conditional sum of squares method as initial parameter estimates, then returns the maximum likelihood estimators.

Example 9.39 Fit the ARMA(1, 1) model to the [latex]n = 98[/latex] annual Lake Huron levels from 1875–1972 described in Example 9.14. Assess the model adequacy of the fit and predict the level of Lake Huron for the next five years (1973–1977).

The first R statement below fits the ARMA(1, 1) model to the Lake Huron levels. The next four statements extract the estimated coefficients, estimated white noise variance, estimated variance–covariance matrix of the coefficients, and the residuals.

The parameter estimates (to five-digit accuracy that is inherent in the time series values) are

$\begin{array}{l} {\hat{ϕ}}_{1} = 0.74490 {\hat{θ}}_{1} = 0.32059 \hat{μ} = 579.06 {\hat{σ}}_{Z}^{2} = 0.47494 . \end{array}$

The estimated variance–covariance matrix of [latex]\hat \phi_1[/latex], [latex]\hat \theta_1[/latex], and [latex]\hat \mu[/latex] is

$\begin{array}{l} [\begin{array}{ccc} 0.0060296 & - 0.0046761 & 0.0017655 \\ - 0.0046761 & 0.0128889 & - 0.0020637 \\ 0.0017655 & - 0.0020637 & 0.1225691 \end{array}] . \end{array}$

Using the square roots of the diagonal elements of the variance–covariance matrix as standard error estimates, the following additional R commands give approximate two-sided 95% confidence intervals for the parameters.

The approximate 95% confidence intervals are

$\begin{array}{l} 0.59271 < ϕ_{1} < 0.89709 0.09808 < θ_{1} < 0.54310 578. 37 < μ < 579.74 . \end{array}$

Since none of these confidence intervals contains zero, we continue to entertain this tentative ARMA(1, 1) model and transition to an analysis of the residuals.

The following R commands plot the residuals as a time series, along with the associated sample autocorrelation function and sample partial autocorrelation function.

The results are displayed in Figure 9.44. From the top graph, the residuals do not appear to have any cyclic variation, trend, or serial correlation. The sample autocorrelation function values for the residuals do not have any values that fall outside of the 95% confidence limits. Likewise for the sample partial autocorrelation function values.

Figure 9.44: Time series plot, r_k, and [latex]r_k^*[/latex] for the residuals from a fitted ARMA(1, 1) model.

Long Description for Figure 9.44

In the residual plot, the horizontal axis t ranges from 1 to 98 and the vertical axis Z cap of t ranges from negative 2 to 2 in increments of 1 unit. A horizontal line is drawn at 0. The residuals are plotted above and below the horizontal line from t equals 1 to 98 between negative 1.5 and 1 on the vertical axis. The residual plot does not follow any trend and dots are scattered randomly throughout the t values. In both correlograms, the horizontal axis k ranges from 0 to 40 in increments of 10 units. The vertical axis in the first correlogram represents r subscript k values while the second correlogram represents r star subscript k values. A horizontal line is drawn at 0.0. Both correlograms approximately follow a similar trend. For k value 0, the r subscript k value is 1.0. The remaining r subscript k values range between negative 0.2 and 0.2 in both correlograms. All data are estimated.

Since there are no sample autocorrelation function values that fall outside of the 95% confidence limits [latex]\pm 1.96 / \sqrt{n}[/latex] in the plot in Figure 9.44 of the first 40 sample autocorrelation function values associated with the residuals, and we expect [latex]40 \cdot 0.05 = 2[/latex] values to fall outside of these limits in the case of a good fit, we fail to reject H₀ in this case. The independence of the residuals is not rejected by this test. The tentative fitted ARMA(1, 1) model is not rejected by this test.

The R code below calculates the Box–Pierce test statistic and the Ljung–Box test statistic and the associated p-values using the built-in Box.test function.

The Box–Pierce test statistic is 17.4 and the associated p-value is [latex]p = 0.997[/latex]. The Ljung–Box test statistic is 23.0 and the associated p-value is [latex]p = 0.966[/latex]. We fail to reject H₀ in both tests based on the chi-square critical value with [latex]40 - 3 = 37[/latex] degrees of freedom. The independence of the residuals is not rejected by this test. The tentative fitted ARMA(1, 1) model is not rejected by these tests.

The following R code calculates the test statistic and the p-value for the turning point test applied to the time series consisting of the residual values for the ARMA(1, 1) fit to the Lake Huron time series.

The tail probability is doubled because the alternative hypothesis is two-tailed for the turning point test. The test statistic s is 1.21 and the p-value is [latex]p = 0.23[/latex]. The turning point test found that there were [latex]T = 69[/latex] turning points in the time series of the residuals, and that is just slightly higher than the number that we expect to have if the residuals from the fitted ARMA(1, 1) model were mutually independent random variables. We again fail to reject the null hypothesis in this case. The independence of the residuals is not rejected by this test. The tentative fitted ARMA(1, 1) model is not rejected by this test.

Figure 9.45: Histogram (left) and QQ plot (right) of the fitted ARMA(1, 1) standardized residuals.

Long Description for Figure 9.45

In the histogram, the horizontal axis ranges from negative 3 to 3 in increments of 1 unit. The vertical axis ranges from 0 to 20 in increments of 5 units. The distribution of the bars is roughly bell shaped. There are 10 bars between negative 2.5 and 2.5 and the frequency of the bars are 1, 8, 4, 17.5, 17.5, 22, 12.5, 9, 2.5, and 2.5, from left to right. In the Q Q plot, the horizontal and the vertical axes range from negative 3 to 3 in increments of 1 unit. Data points are plotted in an increasing linear trend from negative 2.5 to 2.5 on both axes. A cluster of data points is formed between negative 1 and 1.5 on the horizontal axis and between negative 1 and 1 on the vertical axis. The first few and the last few data points are separated from the cluster. All data are estimated.

The residuals are standardized by dividing by their sample standard deviation. The following R statements plot a histogram of the standardized residuals using the hist function and a QQ plot to assess normality using the qqnorm function.

The plots are shown in Figure 9.45. The histogram shows that all standardized residuals fall between −2.5 and 2.5 and exhibit a roughly bell-shaped probability distribution, with the exception of a deficit of residuals falling between −1.5 and −1.0. The horizontal axis on the histogram is the standardized residual and the vertical axis is the frequency. The QQ plot is approximately linear, indicating a reasonable approximation to normality for the standardized residuals. The horizontal axis on the QQ plot is the standardized theoretical quantile and the vertical axis is the associated normal data quantile. Although a formal statistical goodness-of-fit test (such as the Shapiro–Wilk or the Kolmogorov–Smirnov test) should be conducted, it appears that the assumption of Gaussian white noise is appropriate for the ARMA(1, 1) time series model based on these two plots.

We have seen a number of indicators that the ARMA(1, 1) time series model with Gaussian error terms seems to be an adequate model for the Lake Huron lake level time series, with the exception of a linear trend apparent by viewing the time series in Figure 9.19. The ARMA(1, 1) model has not been rejected by any of the model adequacy tests.

The final fitted shifted ARMA(1, 1) model with maximum likelihood estimates for the parameters is given by

$\begin{array}{l} X_{t} = 579.06 + 0.74490 (X_{t - 1} - 579.06) + 0.32059 Z_{t - 1} + Z_{t}, \end{array}$

where Z_t is a sequence of independent and identically distributed [latex]N(0, \, 0.47494)[/latex] error terms.

With the shifted ARMA(1, 1) model established, we now consider forecasting future values of a time series. In the case of the Lake Huron time series, this corresponds to the one-step-ahead forecast for 1973, the two-steps-ahead forecast for 1974, the three-steps-ahead forecast for 1975, etc. The code below uses the R predict function to generate the forecasted values and their standard errors.

These standard errors can be used to calculate approximate two-sided 95% prediction interval limits on the forecasted values. The results are summarized in Table 9.19. Notice that the forecasts trend monotonically toward [latex]\bar x = 579[/latex] and the standard errors increase as the time horizon h increases. The increasing standard error is consistent with having less precision in the forecast as the time horizon h increases.

Table 9.19: Forecasts and 95% prediction intervals for the Lake Huron time series.
Time	[latex]t = 99[/latex]	[latex]t = 100[/latex]	[latex]t = 101[/latex]	[latex]t = 102[/latex]	[latex]t = 103[/latex]
Year	1973	1974	1975	1976	1977
Forecast	579.73	579.56	579.43	579.34	579.26
Standard error	0.689	1.007	1.146	1.216	1.254
Lower prediction bound	578.38	577.59	577.19	576.95	576.81
Upper prediction bound	581.08	581.53	581.68	581.72	581.72

Figure 9.46: Lake Huron level forecasts and 95% prediction intervals from an ARMA(1, 1) model.

Long Description for Figure 9.46

In the time series plot, the horizontal axis t range from 1 to 108. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 unit. A horizontal line is drawn at x subscript t equals 579. The line graph begins with 580.2 at t equals 1, reaches a maximum of 582 at t equals 2, and progressively decreases to the value 577 at t equals 51 with many fluctuations. It then increases to 580.5 at t equals 53, decreases to 576 at t equals 58, and progressively increases to 581 at t equals 76 with many oscillations. It then decreases to the lowest value of 576 at t equals 90 and again increases to 580 at t equals 98. For t values 99 to 108, the forecasted points are drawn as circles in the decreasing concave down curve from 580 to 579. The 95 percent prediction interval from t equals 99 to 108 ranges between x subscript values 577 and 581.5 and the region is shaded. The actual average level values from t equals 99 to 108 decreasing from 580.98 to 578.97. All data are estimated.

Figure 9.46 shows (a) the original time series [latex]x_{1}, \, x_{2}, \, \ldots, \, x_{98}[/latex] as points ([latex]\bullet[/latex]) connected by lines, (b) the first 10 forecasted lake levels [latex]\hat{X}_{99}, \, \hat{X}_{100}, \, \ldots , \, \hat{X}_{108}[/latex] as open circles ([latex]\circ[/latex]), (c) the 95% prediction intervals as a shaded region, and (d) the next 10 actual average lake level values in July for the years 1973–1982 taken from the NOAA Great Lakes Experimental Research Laboratory website,

$\begin{array}{l} 580.98, 581.04, 580.49, 580.52, 578.57, 578.96, 579.94, 579.77, 579.44, 578.97, \end{array}$

as points ([latex]\bullet[/latex]) connected by lines. The forecasted values as well as the prediction intervals given in Figure 9.46 associated with the fitted ARMA(1, 1) model are very similar to those in Figure 9.25 from Example 9.21. The two models are clearly close competitors for modeling the Lake Huron levels.

ARMA modeling can achieve population autocorrelation function and population partial autocorrelation function shapes that are not possible with just AR(p) and MA(q) models alone. For an ARMA(p, q) model with [latex]p > 0[/latex] and [latex]q > 0[/latex], both the population autocorrelation function and the population partial autocorrelation function tail off; neither of the two cut off after a certain lag.

An inherent weakness of ARMA modeling is that it requires stationarity. Many time series which occur in practice are not stationary, and the next section gives techniques that can be used to overcome this weakness.

9.4 Nonstationary Models

There are two commonly-used strategies for converting a nonstationary time series to a stationary time series in order to use ARMA modeling (or some other model which requires stationarity) on the resultant stationary time series. The first strategy is known as detrending. In this case, the modeler estimates the trend, and then fits a stationary time series model to the difference between the raw time series data and the estimated trend. The second strategy is known as differencing. In this case the modeler differences the time series one or more times, resulting in a stationary time series. Differencing carries the added benefit that no parameters are required other than the number of differences to take. The following two subsections consider these two strategies.

9.4.1 Removing Trends Via Regression

Although regression is not the only way to detrend a time series, it provides an adequate roadmap on how to proceed with the detrending process that generalizes to other mechanisms. This subsection illustrates detrending with a single example. We return for a third time to the Lake Huron levels which were fit to an AR(2) model in Section 9.1.2 and fit to an ARMA(1, 1) model in Section 9.3.

Example 9.40 We again consider the construction of a time series model from the [latex]n = 98[/latex] annual observations of the level of Lake Huron (in feet) between 1875 and 1972 that was first encountered in Example 9.14. The observations are stored in a time series in R named LakeHuron. The scatterplot of the lake levels depicted in Figure 9.47 includes a regression line showing the downward trend in the lake levels over time. The p-value for the statistical test for significance of the slope of this regression line is [latex]p = 4 \cdot 10 ^ {-8}[/latex], providing strong evidence of a downward trend over time, even though the usual assumptions associated with simple linear regression with normal error terms are not perfectly satisfied in this setting. Although the AR(2) and ARMA(1, 1) models have been successfully fitted to this time series treating it as stationary, this tiny p-value prevents us from fully embracing either of these models. The purpose of this example is to explicitly consider this downward trend by fitting the residuals from this simple linear regression model to an ARMA time series model.

Figure 9.47: Lake Huron levels (1875–1972) with regression line.

Long Description for Figure 9.47

The horizontal axis t lists years from 1875 to 1972 in increments of 25 years. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 unit. A regression line with a negative slope is drawn from 580.2 to 578, as t increases from 1875 to 1972. Data points are scattered around the regression line, following a downward trend. The first few data points from t equals 1875 to 1925 range between 578 and 582. The remaining data points are widely scattered along the regression line, between the values 576 and 581 as t increases from 1925 to 1972. All data are estimated.

The residuals from this simple linear regression form a new time series which will be denoted by [latex]\left\{ y_t \right\}[/latex], where [latex]y_t = x_t - \hat x_t[/latex] and [latex]\hat x_t[/latex] is the fitted value in the simple linear regression. The R statements below generate plots of the time series, the sample autocorrelation function, and the sample partial autocorrelation function for the residuals of the simple linear regression.

These plots are displayed in Figure 9.48. The time series of the residuals appears to have no trend and also appears to be centered around zero. In fact, the time series is exactly centered around zero because the residuals of this regression must sum to zero via Theorem 1.6. This means that there is no need to include a shift parameter μ in the ARMA model that we develop for the residuals. The sample autocorrelation function of the residuals appears to be tailing out and the first two sample partial autocorrelation function values are statistically significant. This is strong evidence that an AR(2) model is an appropriate tentative model for the residuals.

Figure 9.48: Residuals plot, r_k, and [latex]r_k^*[/latex] for Lake Huron lake levels.

Long Description for Figure 9.48

In the residual plot, the horizontal axis t ranges from 1 to 98. The vertical axis y subscript t ranges from negative 2.5 to 2.5 in increments of 1 unit. A horizontal line is drawn at x subscript t equals 0. The graph begins with 0.2 at t equals 1, increases to 1.8 at t equals 2, and decreases to the value negative 2.4 at t equals 51 with many fluctuations. It then increases to 1.8 at t equals 53, decreases to negative 2.5 at t equals 58, and increases to a peak of 2.5 at t equals 78 with many fluctuations. It then decreases to the lowest value of negative 2.3 at t equals 90 and again increases to 2.3 at t equals 98. In the first correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. Two dashed horizontal lines are drawn at negative 0.2 and 0.2. The r subscript k values decrease from 1.0 to 0.05 for k values 0 through 10 and range between negative 0.1 and 0.0 for k equals 11 to 15. In the second correlogram, the horizontal axis k ranges from 0 to 15 in increments of 5 units. The vertical axis r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. The values of r star subscript k for k values 0, 1, and 2 are 1.0, 0.7, and negative 0.25, respectively. The r subscript k values for k equals 3 to 9 are positive, ranging between the values 0.0 and 0.1. The r subscript k value at t equals 10 is negative 0.2, and for the remaining k values, the r star subscript k values range between negative 0.01 and 0.05. All data are estimated.

The following R statements fit the AR(2) model via maximum likelihood estimation to the time series of residuals.

This model should be subjected to all of the model assessment tests that have been applied to all previous time series analyzed in previous examples. The residuals of the estimated AR(2) model to the simple linear regression residuals result in large p-values for the Box–Pierce test and the Box–Ljung test, along with a bell-shaped histogram and an almost perfectly linear QQ normal plot. This evidence confirms the evidence in the plots of r_k and [latex]r^*_k[/latex] which pointed to an AR(2) model for the residuals of the simple linear regression. The predict function can then be used in the usual fashion to forecast the residuals into the future. The plot of the residuals, the next 10 forecasted residuals, and the associated 95% prediction intervals is given in Figure 9.49. As expected, the forecasted values are smooth and converge to zero.

Figure 9.49: Lake Huron residuals, forecasts, and 95% prediction intervals.

Long Description for Figure 9.49

In the time series plot, the horizontal axis t ranges from 1 to 108. The vertical axis y subscript t ranges from negative 2.5 to 2.5 in in increments of 2.5 units. A horizontal line is drawn at x subscript t equals 579. The graph begins with 0.2 at t equals 1, increases to 1.8 at t equals 2, and decreases to the value negative 2.4 at t equals 51 with many fluctuations. It then increases to 1.8 at t equals 53, decreases to negative 2.5 at t equals 58, and increases to a peak of 2.5 at t equals 78 with many fluctuations. It then decreases to negative 2.3 at t equals 90 and again increases to 2.3 at t equals 98. For t values 99 to 108, the forecasted points are drawn as circles in the decreasing, concave curve from 1.8 to 0.0. The 95 percent prediction interval from t equals 99 to 108 ranges between y subscript values negative 2.2 and 2.5, and the region is shaded. All data are estimated.

Figure 9.50: Lake Huron levels (1875–1972) with regression line and forecasts.

Long Description for Figure 9.50

The horizontal axis t lists years from 1875 to 1972 in increments of 25 years. The vertical axis x subscript t ranges from 576 to 582 in increments of 1 year. A regression line with a negative slope decreases from 580.2 to 578 as t increases from 1875 to 1972. The line graph begins with 580.2 in 1875, reaches a maximum of 582 in 1876, and progressively decreases to the value of 577 in 1925 with many fluctuations. It then increases to 580.5 in 1929, decreases to 576 in 1934, and progressively increases to 581 in 1953 with many fluctuations. It then decreases to the lowest value of 576 in 1965 and again increases to 580 in 1972. For the years 1973 to 1982, the forecasted points are drawn as circles in the decreasing concave down curve from 579.5 to 578. The 95 percent prediction interval from 1973 to 1982 ranges between x subscript values 575.5 and 580.5 and the region is shaded. The actual average level values from 1973 to 1982 decreased from 580.98 to 578.97. All data are estimated.

Finally, the last step is to translate Figure 9.49 back to the raw time series observations. Figure 9.50 shows (a) the original time series [latex]x_{1}, \, x_{2}, \, \ldots, \, x_{98}[/latex] as points ([latex]\bullet[/latex]) connected by lines, (b) the regression line associated with the original time series, (c) the first 10 forecasted lake levels [latex]\hat{X}_{99}, \, \hat{X}_{100}, \, \ldots , \, \hat{X}_{108}[/latex] as open circles ([latex]\circ[/latex]), (d) the 95% prediction intervals as a shaded region, and (e) the next 10 actual average lake level values in July for the years 1973–1982 taken from the NOAA Great Lakes Experimental Research Laboratory website,

$\begin{array}{l} 580.98, 581.04, 580.49, 580.52, 578.57, 578.96, 579.94, 579.77, 579.44, 578.97, \end{array}$

as points ([latex]\bullet[/latex]) connected by lines. Notice that the forecasted values converge to the regression line as anticipated. Notice also that the actual values all exceed the forecasted values and the first four forecasted values fall outside of the prediction limits. If we did not know of these actual values, we would be satisfied with this detrended AR(2) time series model. However, these actual values call into question whether the lake levels are truly decreasing over time.

So far, there have been three different approaches to constructing a time series model for the Lake Huron levels:

the shifted AR(2) model from Examples 9.14, 9.15, 9.16, 9.17, 9.18, 9.19, 9.20, and 9.21,
the shifted ARMA(1, 1) model from Example 9.39, and
the AR(2) model applied to the residuals from a simple linear regression from Example 9.40.

Which approach is preferred? Although the shifted AR(2) and shifted ARMA(1, 1) models fitted to the raw time series are roughly comparable and give nearly-identical forecasts, the shifted ARMA(1, 1) model has a slight edge for the following two reasons. First, from Table 9.7, the AIC value is 215 for the shifted AR(2) model and the AIC value is 214 for the shifted ARMA(1, 1) model. A smaller value implies a better fit. Second, the sum of squared residuals for the shifted AR(2) model is 46.9 and the sum of squared residuals for the shifted ARMA(1, 1) model is 46.5. A smaller sum of squared residuals for two models with an equal number of parameters is preferred. Both models have four parameters. These two sums of squared residuals for the two models are computed with the R statements

Although the differences between the AIC values and the sums of squares is small, the shifted ARMA(1, 1) model holds a slight edge.

The detrended model from Example 9.40, on the other hand, is preferred over the two stationary models because it explicitly models the decreasing lake levels over time. However, the fact that all of the forecasted values in the detrended model are low relative to the actual values in the years 1973 to 1982 is troubling. Could it be the case that there was no downward trend after all? At this point, some serious detective work is in order to see if the early values in the raw time series were elevated by some external influence and should not be included as a part of the time series. A rigorous search should be conducted for any external cause which might elevate the early values in the time series: excess rainfall, elevated temperatures, dredging, bridge projects, flow control projects, etc. As a particular instance, if the first 20 values of the time series can be eliminated due to the identification of an assignable cause for the years 1875–1894, for example, the p-value from simple linear regression testing for the statistical significance of the slope increases from a highly significant [latex]p = 4 \cdot 10 ^ {-8}[/latex] to a nonsignificant [latex]p = 0.11[/latex]. The downward trend would now be slight and a stationary model could be fitted to the remaining values in the time series.

Detrending has proved to be an effective method for transforming a nonstationary time series to a stationary time series. The second technique involves differencing.

9.4.2 ARIMA(p, d, q) Models

George Box and Gwilym Jenkins devised a time series modeling methodology known as ARIMA modeling. The I between AR and MA stands for integrated. These models are sometimes referred to as Box–Jenkins models. An ARIMA([latex]p, \, d, \, q[/latex]) time series model is one in which the dth-differenced times series, [latex]\nabla ^ d X_t[/latex], is an ARMA([latex]p, \, q[/latex]) time series. So ARIMA time series modeling uses repeated differencing of the raw time series in order to achieve a time series which appears to be stationary. ARMA modeling can be then applied to the resulting stationary time series.

Three key parameters in an ARIMA model are p, d, and q, which are all nonnegative integers. The parameter p is the number of coefficient parameters in the autoregressive portion of the model. The parameter d is the number of differences that are applied to the original time series in order to achieve stationarity. The parameter q is the number of coefficient parameters in the moving average portion of the model. So the general format for specifying an ARIMA model is ARIMA([latex]p, \, d, \, q)[/latex]. In addition to the parameters p, d, and q, there are [latex]p + q + 1[/latex] parameters that define an ARIMA([latex]p, \, d, \, q)[/latex] model: the p autoregressive parameters [latex]\phi_1, \, \phi_2, \, \ldots , \phi_p[/latex], the q moving average parameters [latex]\theta_1, \, \theta_2, \, \ldots , \theta_q[/latex], and the variance of the white noise [latex]\sigma _Z ^ {\, 2}[/latex]. As in the case of ARMA models, a shift parameter μ can be included in the model. If one or more of these parameters is zero, they are omitted from the specification. An IMA([latex]2, \, 1[/latex]) model, for example, has [latex]p = 0[/latex] autoregressive terms, [latex]d = 2[/latex] differences, and [latex]q = 1[/latex] moving average term. If a model only involves, for example, the autoregressive portion of the model with two terms (that is, no differencing and no moving average terms), then this model is specified as an AR(2) model. An ARMA([latex]p, \, q[/latex]) model is a special case of an ARIMA([latex]p, \, d, \, q[/latex]) model when [latex]d = 0[/latex].

ARIMA modeling will be illustrated by a simulation example that will reveal what a realization of an ARIMA process looks like, along with the R code required to fit these simulated values to an ARIMA model.

Example 9.41 Simulate a realization of [latex]n = 100[/latex] observations from an ARI(1, 1) time series model with [latex]\phi = 0.8[/latex] and [latex]\sigma _ Z ^ {\, 2} = 4[/latex]. Fit the resulting simulated values to an ARIMA model.

This problem gives one instance of what an ARIMA model with a nonzero value for d looks like. The R code below uses the arima.sim function to generate a realization of an ARIMA([latex]1, \, 1, \, 0[/latex]) time series model, which is more commonly known as an ARI([latex]1, \, 1[/latex]) model. Even though 99 observations are requested, a total of 100 will be generated because the differencing operator is being undone within arima.sim. The code also plots the sample autocorrelation function and the sample partial autocorrelation function of the simulated realization.

The results are shown in Figure 9.51. The realization is clearly generated from a nonstationary time series model with an overall meandering upward trend. This conclusion is supported by the graphs of r_k and [latex]r ^ * _ k[/latex].

Figure 9.51: Time series plot, r_k, and [latex]r ^ * _ k[/latex] for a realization of a simulated ARI(1, 1) model.

Long Description for Figure 9.51

“In the time series plot, the horizontal axis labeled t ranges from 1 to 100. The vertical axis labeled x subscript t and ranges from negative 10 to 110 in increments of 10 units. The curve starting at (1, negative 5) drops to (10, negative 10) and increases to (50, 60). It again drops to (70, 40) and then increases to reach (80, 80). It once again drops to (75, 60) and increases to (100, 100). All data are approximate. In the correlogram on the left, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r subscript k values negative 0.2 and 0.2. The r subscript k values for the first 40 lags starts from 1 and gradually declines to 0.05 at 30 and further declines to 0.1 at 40. All data are approximate. In the correlogram on the right, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r subscript k values negative 0.2 and 0.2. The r star subscript k values for the first 40 lags are 1, 1, for the first two values and then ranges between negative 0.1 and 0.6 for the remaining values. All data are approximate.”

The augmented Dickey–Fuller test can be used to assess the stationarity of the simulated time series. It has been implemented in R in the adf.test function in the tseries package. There is no need to run this test for this particular realization of the time series; the time series plot clearly shows that this is a nonstationary time series. Now consider fitting this time series realization to an ARIMA([latex]p, \, d, \, q[/latex]) model. Since the time series realization exhibits a meandering linear increase, it is possible that a single difference might be adequate for transforming this time series to achieve stationarity. Although it is in some sense cheating because we know that the realization was generated from an ARI(1, 1) time series model, the R code that follows takes a single difference of the time series depicted in Figure 9.51 and plots the differenced series [latex]y_t = \nabla x_t = x_t - x_{t - 1}[/latex], the associated sample autocorrelation function, and the sample partial autocorrelation function.

Figure 9.52 shows a graph of the differenced time series and the associated graphs of r_k and [latex]r^*_k[/latex]. The differencing has achieved its goal; the differenced values appear to be stationary. Furthermore, the sample partial autocorrelation function has a single statistically significant value at lag 1 and then cuts off. (The statistically significant value at lag 18 is attributed to random sampling variability because we expect that 2 of the 40 [latex]r^*_k[/latex] values will lie outside of the 95% bounds by chance.) The sample autocorrelation appears to be gradually tailing out. This is evidence that supports an AR(1) model for the differenced values from the ARI(1, 1) realization, just as we suspected would be the case.

Figure 9.52: Time series plot, r_k, and [latex]r_k^*[/latex] for differences of a realization of the simulated values.

Long Description for Figure 9.52

“In the time series plot, the horizontal axis labeled t ranges from 1 to 100. The vertical axis labeled y subscript t ranges from negative 4 to 8 in increments of 4 units. The curve starting at (1, negative 4) vacillates between vertical axis value of negative 4 and 7 and ends at (100, 01). All data are approximate. In the correlogram on the left, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r subscript k values negative 0.2 and 0.2. The r subscript k values for the first 40 lags are 1, 0.7, 0.5, 0.22, 0.05, negative 0.05, negative 0.1, negative 0.18, negative 0.2, negative 0.21, negative 0.25, negative 0.25, negative e0.2, negative 0.05, negative 0.05, negative 0.01, 0.01, 0.05, 0.01, 0, 0.01, 0.03, 0.03, 0.01, negative 0.05, negative 0.1, negative 0.19, negative 0.2, negative 0.2, negative 05, negative 0.5, negative 0.6, negative 0.5, negative 0.5, 0.5, 0.5, 0.7, 0.5. All data are approximate. In the correlogram on the right, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r subscript k values negative 0.2 and 0.2. The r star subscript k values for the first 40 lags are: 1, 0.7, negative 0.05, negative 0.05, negative 0.1, negative 0.1, 0, 0, negative 0.1, negative 0.18, negative 0.1, negative 0.19, negative 0.1, negative 0.19, 1.17, negative 0.2, 0.1, 0.05, 01, negative 0.25, 0.1, 0.05, 0.04, negative 0.1, negative 0.08, negative 0.05,, 0.1, negative 0.1, negative 0.1, 0.18, negative 0.05, negative 0.19, negative 0.05, 0.01, negative 0.01, 0.05, 0.01, negative 0.1, negative 0.1, negative 0.05. All data are approximate.”

The R statements

return an estimated coefficient [latex]\hat \phi = 0.73[/latex] (which is near the population value [latex]\phi = 0.8[/latex]) and estimated white noise variance [latex]\hat{\sigma} _ Z ^ {\, 2} = 2.9[/latex] (which is near the population value [latex]\sigma _Z ^ {\, 2} = 4[/latex]). A Monte Carlo simulation could be conducted to see how far these estimated values stray from their population counterparts. Increasing the length of the time series will make these estimates closer to their associated population values on average.

The ARIMA modeling process is adequate for nonstationary models but is not well-suited to handling cyclic variation. The SARIMA (seasonal autoregressive integrated moving average) model has been formulated to overcome this weakness.

An ARIMA model is a special case of a SARIMA model when [latex]P = D = Q = 0[/latex]. The ∇^d term in the SARIMA model is associated with an ordinary difference; the [latex]\nabla _s ^ D[/latex] term is associated with a seasonal difference. Consider the inside portion of the SARIMA defining formula, [latex]\nabla ^ d \nabla ^ D _ s X_t[/latex], in a modeling setting in which monthly data is being collected and the modeler believes that there is cyclic annual variation, so [latex]s = 12[/latex]. In the case of [latex]d = 1[/latex] ordinary difference and [latex]D = 1[/latex] seasonal difference, this portion of the SARIMA defining formula becomes

$\begin{array}{l} \nabla \nabla_{12} X_{t} & = \nabla (\nabla_{12} X_{t}) \\ = \nabla (X_{t} - X_{t - 12}) \\ = (X_{t} - X_{t - 12}) - (X_{t - 1} - X_{t - 13}) . \end{array}$

The ∇ operator is being used to eliminate a linear trend and the ∇₁₂ operator is being used to eliminate seasonality. The seasonal AR term [latex]\Phi \left( B ^ {\kern 0.04em s} \right)[/latex] and the seasonal MA term [latex]\Theta \left( B ^ {\kern 0.04em s} \right)[/latex] in Definition 9.8 provide autoregressive and moving average terms for observations that are s units distant in time.

Example 9.42 Forecast the next three years of international air travel based on the AirPassengers time series from Example 7.2.

The plot of the time series is given in Figure 9.53. As indicated in Example 7.32, the annual cycle associated with international air travel over this period does not appear to be sinusoidal in nature. The peak months for international travel are in July and August when school is not in session and the low month for international travel is November, as seen in Figure 7.30. This time series provides a challenging modeling exercise because it exhibits a nonconstant variance, a trend, and periodicity. These three modeling challenges will be addressed in that order, one by one.

Figure 9.53: International airline passengers (in thousands) 1949–1960.

Long Description for Figure 9.53

The horizontal axis labeled t lists the years from 1949 to 1960 in increments of 1. The vertical axis labeled x subscript t ranges from 0 to 600 in increments of 100. The number of passengers in 1949, 1950 and 1951 are 100 each. It increases to 150 in 1952. It is 170 in 1953 and progresses gradually to 350 in 1960. The peaks occur between two years. A peak of 150 occurs between 1949 and 1950. The peaks also progress gradually to reach the highest point of 600, between 1960 and 1961. All data are approximate.

We begin by addressing the nonconstant variance. Since the variance appears to be increasing over time, a logarithmic transformation is reasonable transformation to apply to the time series. Let [latex]\left\{ x_t \right\}[/latex] denote the original time series and let [latex]y_t = \ln x_t[/latex]. The R statement

plots the natural logarithm of the raw time series. The plot of the transformed time series is given in Figure 9.54.

Figure 9.54: Logarithm of international airline passengers (in thousands) 1949–1960.

Long Description for Figure 9.54

The horizontal axis labeled t lists the years from 1949 to 1960 in increments of 1. The vertical axis labeled y subscript t ranges from 4.5 to 6.5 in increments of 0.5. The curve starts from (1949, 4.7) and increases linearly to (1960, 5.7). The peaks occur between the years, with the lowest at 5 between 1949 and 50, and the highest at 6.5 between 160 and 1961. All data are approximate.

The transformation appears to be effective. The variance of the logarithms of the raw time series observations is now close to constant over time. The next step is to address the trend. Since the trend of the transformed time series depicted in Figure 9.54 is approximately linear, a single difference ([latex]d = 1[/latex]) is taken. The differenced time series is

$\begin{array}{l} w_{t} = \nabla y_{t} = \nabla \ln x_{t} = \ln x_{t} - \ln x_{t - 1} . \end{array}$

(The resulting time series is not named z_t to avoid any conflict with the white noise terms.) This transformed and differenced time series, along with the associated sample autocorrelation function and sample partial autocorrelation function are graphed with the R statements

The associated graphs are displayed in Figure 9.55. Instead of the usual [latex]12 \cdot 12 = 144[/latex] observations from the previous two figures, the differencing operation leaves only 143 observations, which is reflected in the labels on the horizontal axis of the plot of the differences of the logarithms of the original time series values. The differencing has proved to be successful. The time series plot of w_t appears to be stationary. The strongly statistically significant sample autocorrelation function values at lags 12, 24, and 36 are a reminder that even though the nonconstant variance and trend have been addressed, the cyclic variation has not been addressed. ARMA modeling is not appropriate at this point because r_k is neither tailing out nor cutting off. There is still an annual cyclic component present in [latex]\left\{ w_t \right\}[/latex]. A reasonable way to proceed is to employ a seasonal ARIMA model to account for the cyclic variation. Backing up one level, we would like to fit a SARIMA [latex](p, \, d, \, q) \times (P, \, D, \, Q)_{12}[/latex] model to the natural logarithms of the raw passenger counts. The choice [latex]s = 12[/latex] for the seasonal order is to account for the monthly collection of the passenger counts which exhibit a clear annual cycle; the choice of [latex]d = 1[/latex] is appropriate based on the fact that the time series [latex]\left\{ w_t \right\}[/latex] in Figure 9.55 appears to be stationary. But what about the other parameters (p, q, P, D, and Q)? An exhaustive search using the arima function in R to locate the smallest value of AIC results in the following settings:

$\begin{array}{l} d = P = Q = 1 \end{array}$

and

$\begin{array}{l} p = q = D = 0. \end{array}$

Figure 9.55: Logarithm of international airline passengers (in thousands) 1949–1960.

Long Description for Figure 9.55

“In the time series plot, the horizontal axis t ranges from 1 to 143. The vertical axis labeled w subscript t ranges from negative 0.25 to 0.25 in increments of 0.25 units. The curve starting at (1, 0.05) oscillates between vertical axis value of negative 0.25 and 0.25 and ends at (143, 0.125). All data are approximate. In the correlogram on the left, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r subscript k values negative 0.2 and 0.2. The r subscript k values for the first 40 lags are 0.2, negative 0.19, negative 0.2, negative 0.3; negative 0.05, 0.01, negative 0.08, negative 0.3, negative 0.09, negative 0.09, 0.2, 0.8, 0.2, negative 0.2, negative 0.08, negative 0.3; negative 0.01, 0, negative 0.09, negative 0.3, negative 0.08, negative e0.07, 0.2, 0.7, 0.2, negative 0.09, negative 0.08, negative 0.05, 0.1, negative 0.09, negative 0.3, negative 0.09, negative 0.01, 0.2, 0.6, 0.2, negative 0.1, negative 0.5, negative 0.2. All data are approximate. In the correlogram on the right, the horizontal axis labeled k ranges from 0 to 40 in increments of 10 unit. The vertical axis labeled r star subscript k ranges from negative 1.0 to 1.0 in increments of 0.5 units. A solid horizontal line is drawn at 0.0. The dashed horizontal lines are drawn at r star subscript k values at negative 0.2 and 0.2. The r star subscript k values for the first 40 lags are 0.2, negative 0.2, negative 0.1, negative 0.3; negative 0.05, negative 0.2, negative 0.5, negative 0.21, negative 0.51, negative 0.3, 0.5, 0.05, negative 0.2, 0.19, 0.05, negative 0.05, negative 0.05, negative 0.01, 0.01, negative 0.05, 0.01, 0.02, 0.01, negative 0.1, 0.01, 0.1, 0.19, genitive 0.1, 0.1, negative 0.1, 0.1, negative 0.01, negative 0.1, 0.1, 0.05. All values are approximate.”

The single R statement below fits the SARIMA [latex](0, \, 1, \, 0) \times (1, \, 0, \, 1)_{12}[/latex] model to the logarithms of the passenger counts in the AirPassengers time series.

The maximum likelihood estimates of the parameters are [latex]\Phi = 0.9877[/latex], [latex]\Theta = -0.5935[/latex] and [latex]\hat{\sigma}_Z ^ {\, 2} = 0.001526[/latex]. So the fitted model is

$\begin{array}{l} (1 - 0.9877 B^{12}) \nabla Y_{t} = (1 - 0.5935 B^{12}) Z_{t}, \end{array}$

where [latex]Y_t = \ln \, X_t[/latex], and [latex]Z_t \sim WN \left( 0, \, 0.001526 \right)[/latex] This fitted SARIMA model achieves an AIC value of −486.9953. There are several other competing SARIMA models with nearby AIC values.

The final step is to use the fitted SARIMA model to forecast international airline travel for the subsequent three years (36 months). The R code below fits the SARIMA model with the arima function, uses the predict function to calculate the forecasted values and their standard errors, and then plots the original time series, the forecasted values, and the 95% prediction intervals.

This graph of the original time series and the 36 forecasted values is given in Figure 9.56. The forecasts from the SARIMA model show that the nonconstant variance, trend, and cyclic variation have been adequately captured by the model. Since there are [latex]144 + 36 = 180[/latex] points squeezed so tightly together in the plot, a second graph of the forecasted values for just the last three cycles of the observed time series and the forecasted values is given in Figure 9.57.

Figure 9.56: Forecasted international travel and 95% prediction intervals.

Long Description for Figure 9.56

The horizontal axis labeled t lists the years from 1949 to 1964. The vertical axis labeled x subscript t ranges from 0 to 1400 in increments of 200. The forms a pattern of waves and spikes. The international travel data for 1949 is 100. It increases to 800 in 1961 following an increasing wavy pattern. The forecasted data from 1961 to 1964 are indicated as open circles. The data ranges between 300 and 600 of the vertical axis values. The 95 percent prediction interval for 1961 to 1964 is indicated as a shaded region with three peaks which are as follows. 800 between 1961 and 1962. 1000 between 1962 and 1963. 1400 between 1963 and 1964. All data are approximate.

Figure 9.57: Forecasted international travel and 95% prediction intervals.

Long Description for Figure 9.57

The horizontal axis labeled t lists the years from 1958 to 1964. The vertical axis labeled x subscript t ranges from 0 to 1400 in increments of 200. The data forms a pattern of waves. The forecasted international travel in 1958 is 390. It increases to 500 between 1958 and 1959. It drops to 380 in 1959 and peaks to 590 between 1959 and 1960. It drops to 400 in 160 and peaks to 600 between 1960 and 1961. It again drops to 400 in 1961 and peaks to 700 between 1961 and 1962. It drops to 460 in 1962 and peaks to 700 between 1962 and 1963. It again drops to 500 in 1963 and peaks to 740 between 1963 and 1964. It drops to 580 in 1964. The forecasted data from 1961 to 1964 are represented as open circles. The data ranges between 300 and 600 of the vertical axis values. The 95 percent prediction interval for 1961 to 1964 is indicated as a shaded region, with three peaks as follows. 800 between 1961 and 1962. 1000 between 1962 and 1963. 1400 between 1963 and 1964. All data are approximate.

These forecasts are crucial for airports, airline manufacturers, and associated businesses as they can predict the impact of growth on supply chains, personnel requirements, and logistics necessary to support air travel.

To summarize this section, the modeling of a nonstationary time series involves the following steps.

Plot the time series, noting any trends, seasonality, and nonconstant variance.
Make the variance stable by applying appropriate transformations if necessary.
Use detrending (possibly regression) or repeated differencing (to use an ARIMA model) to create a stationary time series.
Plot the stationary time series along with its sample autocorrelation function and sample partial autocorrelation function.
Hypothesize a tentative ARMA model for the stationary time series model. If there is a seasonal component, consider a SARIMA model on the transformed time series.
Fit the tentative ARMA or SARIMA model. Perform the model assessment tests on the tentative time series model. If the fitted tentative ARMA or SARIMA model fails these tests, then hypothesize a new tentative model.
Perform overfitting in the final model selection process to ensure that the best model has been selected.
Apply the final time series model in the fashion dictated by the problem setting (this is often forecasting future values of the time series).

As illustrated in Example 9.42, time series modeling can be thought of as a step-by-step process of identifying a removing causes of variation in the time series (for example, trend, cycles, autocorrelation) until all that remains is white noise.

9.5 Spectral Analysis

In practice, many time series exhibit cyclic variation. The first two examples in Chapter 7 concerning monthly residential power consumption and monthly international airline travel both contain a cyclic component. The time series models derived from the general linear model do not explicitly consider cyclic variation; these models exist in what is known as the time domain. Spectral analysis considers modeling in the frequency domain. Spectral analysis decomposes a stationary time series into sinusoidal components (that is, sine and cosine functions) in order to identify frequencies associated with periodic components. Just as autoregressive models use regression on previous values of a time series in the time domain, spectral analysis uses regression on sine and cosine terms in the frequency domain.

Table 9.20 presents some new terminology that arises in spectral analysis and presents some analogies with known data analysis techniques. The column headings indicate that the three subsequent rows contain three application areas, three probability constructs, and their three statistical counterparts.

- The first row concerns the analysis of univariate data. In probability theory, several commonly-used probability distributions (for example, the exponential, normal, and binomial distribution) are investigated in order to build an inventory of potential probability distributions that might adequately describe a univariate data set. When an analyst encounters a univariate data set, one of the early steps in the analysis is to plot a histogram and compare its shape to the inventory of probability density functions associated with known probability distributions.
- The second row concerns time series analysis in the time domain. Shapes of the population autocorrelation function are derived for several commonly-used time series models (for example, the AR(2), MA(1), and ARMA(1, 1) models) in order to build an inventory of shapes such as those given in Figure 9.43 that might adequately describe the time series. When a time series analyst encounters time series observations, one of the early steps in the analysis is to plot the correlogram (a.k.a., the sample autocorrelation function) and compare its shape to the inventory of known population autocorrelation functions.

- The third row concerns time series analysis in the frequency domain. Shapes of the spectral density function are derived for several commonly-used time series models in order to build an inventory of shapes that might adequately describe the periodic nature of a time series. When a time series analyst encounters time series observations, one of the early steps is to plot the periodogram and compare its shape to the inventory of known spectral density functions.

Table 9.20: Population versus sample representations.
Application area	Probability construct	Statistical counterpart
univariate data analysis	probability density function	histogram
time series analysis: time domain	population autocorrelation function	correlogram
time series analysis: frequency domain	spectral density function	periodogram

The next two subsections will focus on the spectral density function and its statistical counterpart, the periodogram.

9.5.1 The Spectral Density Function

The emphasis in the spectral analysis of a time series is the identification of the frequencies associated with cycles. The frequencies will be denoted here by ω. Just as the population autocorrelation function is the natural tool for identifying and quantifying autocorrelation in the time domain, the spectral density function is the natural tool for identifying and quantifying the frequencies associated with cyclic variation in the frequency domain. As seen in the following definition, the spectral density function can be written in terms of the population autocovariance function.

The interpretation of the spectral density function is that [latex]f(\omega) \Delta \omega[/latex] reflects the contribution of frequencies in the interval [latex](\omega, \, \omega + \Delta \omega)[/latex] to the variance of X_t for small values of [latex]\Delta \omega[/latex]. When [latex]f(\omega)[/latex] is high, then frequencies near ω have a large impact on X_t. When [latex]f(\omega)[/latex] is low, then frequencies near ω have a small impact on X_t. The upper limit of the support of the spectral density function, π, is known as the Nyquist frequency. Frequencies that exceed π are not captured by the spectral density function. This is not a universal choice for the definition of the spectral density function or the upper limit of its support. There are many valid alternative choices. A common alternative choice for the upper limit of the support is [latex]1 / 2[/latex].

The first example illustrates the calculation of a spectral density function for one of the most basic time series models.

Example 9.43 Find the spectral density function for an ARMA(0, 0) time series model.

An ARMA(0, 0) model is simply white noise, so the population autocovariance function is

$\begin{array}{l} γ (k) = {\begin{cases} σ_{Z}^{2} & k = 0 \\ 0 & k = 1, 2, \dots . \end{cases} \end{array}$

Using Definition 9.9, the spectral density function is

$\begin{array}{l} f (ω) = \frac{1}{π} [γ (0) + 2 \sum_{k = 1}^{\infty} γ (k) \cos (ω k)] = \frac{σ_{Z}^{2}}{π} 0 < ω < π . \end{array}$

Figure 9.58 shows the spectral density function for the ARMA(0, 0) process. Since there is no cyclic variation whatsoever in the ARMA(0, 0) time series model, no frequency stands out over another, so the spectral density function is uniformly distributed of the frequencies between 0 and π. Each frequency on the interval [latex](0, \, \pi)[/latex] contributes equally to the variance of X_t. Neither high frequencies nor low frequencies play a dominant role in the in terms of cyclic variation of this process. Notice that the area under [latex]f(\omega)[/latex] between 0 and π is [latex]\sigma _ X ^ {\, 2} = \sigma _ Z ^ {\, 2}[/latex].

Figure 9.58: Spectral density function for an ARMA(0, 0) model.

The next example calculates the spectral density function of an MA(1) model. This particular model was chosen because it has an autocovariance function that cuts off after lag 1, which means that the summation given in Definition 9.9 consists of just a single term.

Example 9.44 Find the spectral density function for an MA(1) time series model.

As derived in Section 9.2.1, the population autocovariance function of an MA(1) time series model is

$\begin{array}{l} γ (k) = {\begin{cases} (1 + θ^{2}) σ_{Z}^{2} & k = 0 \\ θ σ_{Z}^{2} & k = 1 \\ 0 & k = 2, 3, \dots . \end{cases} \end{array}$

Using Definition 9.9, the spectral density function is

$\begin{array}{l} f (ω) = \frac{1}{π} [γ (0) + 2 \sum_{k = 1}^{\infty} γ (k) \cos (ω k)] = \frac{σ_{Z}^{2}}{π} [1 + θ^{2} + 2 θ \cos ω] 0 < ω < π . \end{array}$

In order to develop some intuition about the spectral density function, consider two special cases of the MA(1) model: [latex]\theta = 9 / 10[/latex] and [latex]\theta = - 9 / 10[/latex]. These two values of θ correspond to stationary and invertible MA(1) time series models.

When [latex]\theta = 9 / 10[/latex], the spectral density function reduces to

$\begin{array}{l} f (ω) = \frac{σ_{Z}^{2}}{π} [\frac{181}{100} + \frac{9}{5} \cos ω] 0 < ω < π . \end{array}$

Figure 9.59 shows the spectral density function for an MA(1) model with [latex]\theta = 9 / 10[/latex]. Since [latex]\theta > 0[/latex], the lag 1 population autocorrelation is positive, which means that a realization of this time series will linger above the mean for a few observations and then linger below the mean for a few observations. But the number of observations that the sequence lingers above or below the mean is random. This is the behavior that we saw in the simulated values in Example 9.33. In the simulated realization, sometimes the time series only lingers above or below the mean for just 2 or 3 simulated observations. In other cases, the time series lingers above or below the mean for 6 or 7 simulated observations. In other words, there is low-frequency variation in this time series, but it does not have a single consistent frequency. This pattern of lingering on one side of the mean corresponds to low frequency cycles, and those low frequency cycles correspond to smaller values of ω. This is reflected in the spectral density function in Figure 9.59, where the lower frequencies have larger values of [latex]f(\omega)[/latex] than the higher frequencies. Lower frequency variation dominates.

Figure 9.59: Spectral density function for an MA(1) model with [latex]\theta = 9 / 10[/latex].

When [latex]\theta = -9 / 10[/latex], the spectral density function reduces to

$\begin{array}{l} f (ω) = \frac{σ_{Z}^{2}}{π} [\frac{181}{100} - \frac{9}{5} \cos ω] 0 < ω < π . \end{array}$

Figure 9.60 shows the spectral density function for an MA(1) model with [latex]\theta = -9 / 10[/latex]. Since [latex]\theta < 0[/latex], the lag 1 population autocorrelation is negative, which means that the observations in a realization of this time series will often jump from one side of the mean value to the other. This is the behavior that we saw with the simulated values in Example 9.34. In most cases, when one observation was on one side of the mean, the next observation was on the other side of the mean. Occasionally, however, the time series lingered for 2 or 3 observations on one side of the mean. Once again, this behavior is random and does not correspond to a single consistent frequency. This pattern of adjacent observations jumping from one side of the mean to the other corresponds to high frequency cycles, and those high frequency cycles correspond to larger values of ω. This is reflected in the spectral density function in Figure 9.60, where the higher frequencies have larger values of [latex]f(\omega)[/latex]. Higher frequency variation dominates.

A graph presents the spectral density function for M A 1 model with theta equals negative 9 over 10. The horizontal axis omega ranges from 0 to pi and the vertical axis f of omega ranges from 0 to 361 sigma Z squared over 100 pi. The S-shaped curve extends between points (0, 0) and (pi, 361 sigma Z squared over 100 pi). — Figure 9.60: Spectral density function for an MA(1) model with [latex]\theta = -9 / 10[/latex].

One common element from the spectral density functions given in the two previous examples is that they both integrate to [latex]\sigma _ X ^ {\, 2}[/latex]. This is true in general. Some time series analysts prefer to divide the spectral density function by [latex]\sigma _ X ^ {\, 2}[/latex] so that it will integrate to 1, making it a true probability density function. The normalized spectral density function is given by

$\begin{array}{l} f^{*} (ω) = \frac{f (ω)}{σ_{X}^{2}} 0 < ω < π . \end{array}$

Dividing both sides of the equation in Definition 9.9 by [latex]\gamma(0) = \sigma _ X ^ {\, 2}[/latex] gives

$\begin{array}{l} f^{*} (ω) = \frac{1}{π} [1 + 2 \sum_{k = 1}^{\infty} ρ (k) \cos (ω k)] 0 < ω < π . \end{array}$

The associated normalized spectral cumulative distribution function is defined on the support of ω in the usual fashion as

$\begin{array}{l} F^{*} (ω) = \int_{0}^{ω} f^{*} (w) d w 0 < ω < π . \end{array}$

One advantage of normalizing these two functions is that there is now a clean interpretation of [latex]F^*(\omega)[/latex]. For frequencies ω₁ and ω₂ satisfying [latex]0 < \omega_1 < \omega_2 < \pi[/latex], the expression [latex]F^*(\omega_2) - F^*(\omega_1)[/latex] denotes the proportion of the variance in [latex]\left\{ X_t \right\}[/latex] accounted for by frequencies on the interval [latex](\omega_1, \, \omega_2)[/latex].

9.5.2 The Periodogram

The periodogram is the statistical counterpart to the spectral density function. The periodogram estimates the spectral density function for all frequencies between 0 and π. The shape of the periodogram reflects the frequencies that correspond to significant cyclic variation in a time series. Peaks in the periodogram reveal the dominant frequencies associated with cyclical components in an observed time series.

One topic that is crucial in time series analysis in the frequency domain is how often a time series should be sampled. Consider sampling the outdoor air temperature, for example, in Washington, DC. There are two significant cyclic components that should become apparent in such a time series. First, there is a daily temperature cycle. Temperatures are warmer during the day and cooler at night. This corresponds to high frequency variation. Second, there is an annual temperature cycle. Temperatures are warmer during the summer and cooler during the winter. This corresponds to low frequency variation. There is a factor of 365 (well, actually 365.24219) that separates the frequencies of these two cycles which should be accounted for in how often the time series is sampled. The following illustrations provide instances of sampling this time series too often, sampling this time series not often enough, and sampling this time series at about the right intervals to capture these two frequencies in a periodogram.

Let's say you sample 1000 outdoor air temperatures at Reagan National Airport in Washington DC every second beginning at noon on July 20, 1969. This data collection will be over very soon because 1000 seconds is only about 17 minutes. But you have not covered a daily cycle or an annual cycle, so the frequencies for these two cycles cannot be detected from this sample. The sampling is too frequent.
Let's say you sample 100 outdoor air temperatures at Reagan National Airport in Washington DC annually beginning at noon on July 20, 1969. This experiment will take you a long time to collect because the last value collected will be at noon on July 20, 2068. Even though you have collected the observations through 100 annual temperature cycles and tens of thousands of daily temperature cycles, neither the daily nor the annual cycle can be detected. All observations were made during the summer and during the day. The sampling was too infrequent.
If you desire to detect both the daily and the annual outdoor air temperature cycles at Reagan National Airport, then a sampling interval between the two extremes (every second and every year from the previous two illustrations) must be used. So if you begin sampling hourly data at noon on July 20, 1969 and collect this data for three years, you will have collected outdoor temperature observations over three annual cycles and about a thousand daily cycles. This requires [latex]3 \cdot 365 \cdot 24 = 26,280[/latex] outdoor air temperatures to be collected. This time series allows an analyst to detect both daily and annual cycles. The periodogram, which estimates the spectral density function will have a peak associated with the low frequency (annual) cycles and a second peak associated with the high frequency (daily) cycles.

The details associated with computing the periodogram are left for a full-semester class in time series analysis. Some of the fundamental ideas will be presented here in order to give a sense of the development of the estimator. As has been the case in regression and survival analysis, we begin with a model for a time series having cyclic behavior. One such model is

$\begin{array}{l} X_{t} = c \cdot \cos (ω t + ϕ), \end{array}$

where c is the amplitude of the cyclic variation, ω is the frequency of the cyclic variation, [latex]\phi[/latex] is a phase shift parameter, and the angle is measured in radians. (The [latex]\phi[/latex] used here has nothing to do with [latex]\phi[/latex] from the autoregressive time series models in the time domain.) Unfortunately, this model does not contain any random terms, and such a time series only occurs rarely in practice. So adding a time series of white noise [latex]\left\{ Z_t \right\}[/latex] results in the much more practical model

$\begin{array}{l} X_{t} = c \cdot \cos (ω t + ϕ) + Z_{t} . \end{array}$

Since the phase shift parameter can be tedious in parameter estimation, it is common practice in spectral analysis to apply the trigonometric identity [latex]\cos(x + y) = \cos x \, \cos y - \sin x \, \sin y[/latex] to this model, which results in

$\begin{array}{l} X_{t} = a \cos (ω t) + b \sin (ω t) + Z_{t}, \end{array}$

where [latex]a = c \cdot \cos (\phi)[/latex] and [latex]b = -c \cdot \sin(\phi)[/latex]. This result is symmetric in the two primary trigonometric functions sine and cosine. The derivation thus far has only involved a single frequency ω. As in the previous outdoor air temperature example, it is often the case that there are multiple frequencies of interest. The current time series model can be generalized by summing over the k frequencies [latex]\omega_1, \, \omega_2, \, \ldots , \, \omega_k[/latex]:

$\begin{array}{l} X_{t} = \sum_{j = 1}^{k} (a_{j} \cos (ω_{j} t) + b_{j} \sin (ω_{j} t)) + Z_{t}, \end{array}$

where the amplitudes a_j and b_j reflect the contribution of frequency ω_j to the variability of X_t. For example, if [latex]a_j = b_j = 0[/latex] for one particular index j, then the associated frequency ω_j makes no contribution to the variability of X_t. The three remaining loose ends are (a) the number of frequencies k to consider, (b) which frequencies [latex]\omega_1, \, \omega_2, \, \ldots , \, \omega_k[/latex] to consider, and (c) how to estimate the amplitudes [latex]a_1, \, a_2, \, \ldots, \, a_k[/latex] and [latex]b_1, \, b_2, \, \ldots, \, b_k[/latex]. These loose ends are easier to navigate if the number of elements in the time series n happens to be even, which is assumed for now. If so, then the usual practice is to let [latex]k = n / 2[/latex] and space the ω_j values uniformly between 0 and π as

$\begin{array}{l} ω_{m} = 2 π m / n m = 1, 2, \dots, n / 2. \end{array}$

The lowest frequency that can be detected by the periodogram is [latex]\omega_1 = 2 \pi / n[/latex] and the highest frequency that can be detected by the periodogram is [latex]\omega_{n/2} = \pi[/latex], the Nyquist frequency. The periodogram can be calculated in R with the spectrum function, which is available in the base language. Periodograms often contain significant sampling variability and do not provide a consistent estimator of the spectral density function, so time series analysts often use various techniques to smooth the raw periodogram values.

Example 9.45 Conduct a Monte Carlo simulation experiment with 1000 replications that averages the periodograms associated with an MA(1) model with [latex]\theta = -0.9[/latex] and [latex]\sigma _ Z ^ {\, 2} = 1[/latex] from a time series of [latex]n = 100[/latex] observed values.

This example uses the average of 1000 periodograms to estimate the spectral density function. The R code below uses the arima.sim function to generate 1000 time series of length [latex]n = 100[/latex] from an MA(1) time series model with [latex]\theta = -0.9[/latex] and [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. The realizations of the time series are stored in the vector named x. The periodogram values are computed by the spectrum function and their cumulative values are stored in the vector named s. Setting the plot argument to FALSE suppresses the plots of the individual periodograms in the for loop. Setting the spans argument to a vector of odd integers smooths the periodogram values. The spec component of the list returned by the spectrum function returns the smoothed periodogram values. Finally, the plot function is used to plot the periodogram values. Since the spectrum function returns a support of [latex](0, \, 1 / 2)[/latex], this is stretched to yield a support of [latex](0, \, \pi)[/latex] in the final plot. The curve that is plotted is [latex]k = n / 2 = 100 / 2 = 50[/latex] segments connecting the spectral density function estimates for the frequencies [latex]\omega_1 = 2 \pi / 100, \, \omega_2 = 4 \pi / 100, \, \ldots , \, \omega_{50} = \pi[/latex], which are contained in the freq component of the list returned by the spectrum function.

The average of the 1000 periodograms is plotted in Figure 9.61. As anticipated, the shape of the average of the periodograms is about the same as the shape of the spectral density function in Figure 9.60. The smoothness of the periodogram displayed in Figure 9.60 is deceiving. It is smooth because it is an average of 1000 periodograms. The individual periodograms generated within the for loop are very noisy, particularly when the smoothing parameters in the call to spectrum are eliminated. In a time series application, you will seldom work with the average of 1000 periodograms.

The average of 1000 periodogram for M A 1 model with theta equals negative 9 over 10 is depicted as a roughly S-shaped curve. The horizontal axis labeled omega ranges between 0 and pi. The vertical axis labeled f cap of omega ranges between 0 and 3.5. A curve that is approximately S-shaped extends between points (0,0) and (pi, 3.5). — Figure 9.61: Periodogram averages for an MA(1) model with [latex]\theta = -9 / 10[/latex].

The previous example showed that the periodogram for the target MA(1) process, on average, appears to converge to the associated spectral density function. As illustrated in the final example, you will typically be working with just a single periodogram, which is typically quite noisy.

Example 9.46 Plot the periodogram associated with the annual lynx pelt sales at Hudson's Bay Company in Canada from 1857 to 1911. This data set was first encountered in Example 9.29.

Figure 9.62: Time series plot for [latex]n = 55[/latex] annual lynx pelt sales (1857–1911).

Long Description for Figure 9.62

The horizontal axis labeled t lists the years from 1875 to 1911. The vertical axis labeled y subscript t measures the sales, with values ranging from 0 to 80,000 in increments of 20,000. The data forms a pattern of waves and spikes. The curve with multiple spikes originates at (1857, 21,000) and ends at (1911, 0). The spikes are observed at the following points. (1860, 30,000). (1870, 78,000). (1880, 40,000). (1890, 78,0000). (1989, 58,000). (1910, 60,000). The data shows a spike in every 10 years. A horizontal line extends from points (0, 24,000) to (1911, 24,000). All data are approximate.

The time series plot is given in Figure 9.62. There is a clear periodic component to the time series with a spike in sales every 9 or 10 years, which should be captured by the periodogram. The quickest way to generate a periodogram is with the R statements given below.

This code can be embellished a little to (a) avoid the special treatment of the Nyquist frequency, (b) extend the horizontal axis to π, (c) avoid the use of a logarithmic vertical axis, and (d) include some smoothing of the periodogram with the following additional R statements.

The smoothed periodogram is given in Figure 9.63 There is a pronounced spike corresponding to a frequency of [latex]\omega = 0.1 \cdot 2 \cdot \pi = 0.6283[/latex]. This corresponds to a period of

$\begin{array}{l} \frac{2 \cdot π}{0.1 \cdot 2 \cdot π} = 10, \end{array}$

which is consistent with the time series plot from Figure 9.62, which clearly displays a cycle of length 10.

Figure 9.63: Periodogram for the lynx pelt sales.

9.6 Exercises

9.1 For a stationary AR(1) model, find [latex]V\big[ \bar X \big][/latex]. Give an approximation for [latex]V\big[ \bar X \big][/latex] for large values of n.
9.2 Implement a Monte Carlo simulation that evaluates the method of moments, least squares, and maximum likelihood estimation techniques for an AR(1) model with [latex]n = 100[/latex] observed values and population parameters [latex]\phi = -3 / 4[/latex] and [latex]\sigma _ Z ^ {\, 2} = 1[/latex] and identify the technique that has the smallest mean square error for estimating [latex]\phi[/latex].
9.3 Consider a shifted AR(1) time series model with known parameter values μ, [latex]\phi[/latex], and [latex]\sigma _Z ^ {\, 2}[/latex]. One realization of the time series [latex]x_1, \, x_2, \, \ldots, \, x_{100}[/latex] has been observed. Perform Monte Carlo simulation experiments that provide convincing numerical evidence that the exact two-sided 95% prediction intervals for X₁₀₁ and X₁₀₂ are indeed exact prediction intervals for parameter settings of your choice.
9.4 Consider a stationary shifted AR(1) model defined by

$\begin{array}{l} X_{t} = μ + ϕ X_{t - 1} + Z_{t}, \end{array}$

where μ, [latex]-1 < \phi < 1[/latex], and [latex]\sigma _ Z ^ {\, 2} > 0[/latex] are fixed known parameters and Z_t is Gaussian white noise. Find expressions for
1. [latex]\displaystyle{ \lim_{h \, \rightarrow \, \infty} E \left[ X _ {n + h} \, | \, X_1 = x_1, \, X_2 = x_2 , \, \ldots , \, X_n = x_n \right] }[/latex]
2. [latex]\displaystyle{ \lim_{h \, \rightarrow \, \infty} V \left[ X _ {n + h} \, | \, X_1 = x_1, \, X_2 = x_2 , \, \ldots , \, X_n = x_n \right]. }[/latex]
9.5 Find the limiting half-width of a exact two-sided [latex]{100(1 - \alpha)}\%[/latex] prediction interval for [latex]E \left[ \hat X_ {n + h} \right][/latex] as the time horizon [latex]h \rightarrow \infty[/latex] for an AR(1) time series model with all parameters known.
9.6 For a stationary shifted ARMA([latex]p, \, q[/latex]) time series model with population autocorrelation function [latex]\rho(k)[/latex], the population variance of the sample mean is

$\begin{array}{l} V [\bar{X}] = \frac{σ_{X}^{2}}{n} [1 + 2 \sum_{k = 1}^{n - 1} (1 - \frac{k}{n}) ρ (k)] . \end{array}$

This result was proved in Section 8.2.1. Use this result to find an approximate 95% confidence interval for μ for the beaver data from Example 9.3 for a fitted shifted AR(1) time series model with Gaussian white noise error terms.
9.7 The built-in R time series lh consists of [latex]n = 48[/latex] observations of the luteinizing hormone in blood samples from a woman taken at 10 minute intervals.
1. Plot the time series, the sample autocorrelation function and the sample partial autocorrelation function.
2. Suggest an ARMA([latex]p, \, q[/latex]) model based on your plots.
3. Make a scatter plot of the data pairs [latex](x_{t - 1}, \, x_t)[/latex].
4. Compute the method of moments estimates of the parameters in the model suggested in part (b).
5. Compute the maximum likelihood estimates of the parameters in the model suggested in part (b).
6. Compute an approximate 95% confidence interval for [latex]\phi[/latex].
7. Forecast the next three values in the time series and report 95% prediction intervals for the three forecasts.
8. Perform some research on the luteinizing hormone and indicate some scientific evidence that the time series model you suggested in part (b) is plausible.
9.8 Report the test statistic and p-value for the turning point test applied to the time series of beaver temperatures in their active state from Example 9.3. Comment on the sign of the test statistic and the magnitude of the p-value.
9.9 Consider the time series of [latex]n = 70[/latex] consecutive yields from a batch chemical process (from Box, G.E.P., and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, Revised Edition, Holden–Day, page 32) given in Example 7.20.
1. Plot the time series, the sample autocorrelation function and the sample partial autocorrelation function.
2. Suggest an ARMA([latex]p, \, q[/latex]) model based on your plots.
3. Make a scatter plot of the data pairs [latex](x_{t - 1}, \, x_t)[/latex].
4. Compute the method of moments estimates of the parameters in the model suggested in part (b).
5. Compute the maximum likelihood estimates of the parameters in the model suggested in part (b).
6. Compute an approximate 95% confidence interval for [latex]\phi[/latex].
7. Forecast the next three values in the time series and report 95% prediction intervals for the three forecasts.
9.10 Consider an AR(1) model with population parameters [latex]\phi = 0.8[/latex] and [latex]\sigma _ Z ^ {\, 2} = 1[/latex], and Gaussian white noise. Let [latex]r_1, \, r_2 , \, r_3[/latex] denote the sample autocorrelation function values of the residuals of the fitted time series associated with [latex]n = 100[/latex] observations. Use Monte Carlo simulation to estimate the population mean vector and population variance–covariance matrix, to one-digit accuracy, of [latex]r_1, \, r_2 , \, r_3[/latex] when maximum likelihood estimation is used to estimate the parameters.
9.11 Let B₁ and B₂ be the roots of the characteristic equation [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B ^ 2 = 0[/latex] for an AR(2) time series model

$\begin{array}{l} X_{t} = ϕ_{1} X_{t - 1} + ϕ_{2} X_{t - 2} + Z_{t} . \end{array}$

Let [latex]G_1 = B_1^{-1}[/latex] and [latex]G_2 = B_2^{-1}[/latex]. A general solution for the lag k autocorrelation is (see Box, G.E.P., and Jenkins, G.M. (1976), Time Series Analysis: Forecasting and Control, Revised Edition, Holden–Day, page 59)

$\begin{array}{l} ρ (k) = \frac{(1 - G_{2}^{2}) G_{1}^{k + 1} - (1 - G_{1}^{2}) G_{2}^{k + 1}}{(G_{1} - G_{2}) (1 + G_{1} G_{2})} \end{array}$

for [latex]G_1 \ne G_2[/latex]. Show that calculating the population autocorrelation in this fashion is the same as using the recursive equation for the first five lags for an AR(2) process with parameters
1. [latex]\phi_1 = 1 / 2[/latex], [latex]\phi_2 = 1 / 3[/latex],
2. [latex]\phi_1 = 1[/latex], [latex]\phi_2 = -1 / 2[/latex].
9.12 Create a plot like the one in Figure 9.13 for an AR(2) model stationary region with [latex]\rho(1) = -0.9, \, -0.8, \, \ldots , \, 0.9[/latex] and [latex]\rho(2) = -0.9, \, -0.8, \, \ldots , \, 0.9[/latex]. No labels are necessary on your plot.
9.13 A stationary AR(2) time series model can be written as an MA(∞) time series model. The coefficients θ₁, θ₂, … in the MA(∞) model can be calculated in four fashions. First, they can be calculated using the recursive formulas in Theorem 9.12. Second, they can be written explicitly as (Cryer, J.D. and Chan, K–S, Time Series Analysis: With Applications in R, 2008, Springer, page 75):

$\begin{array}{l} θ_{i} = {\begin{cases} (i + 1) G_{1}^{i} & ϕ_{1}^{2} + 4 ϕ_{2} = 0 \\ (G_{1}^{i + 1} - G_{2}^{i + 1}) / (G_{1} - G_{2}) & ϕ_{1}^{2} + 4 ϕ_{2} > 0 \\ R^{i} \sin [(i + 1) Θ] / \sin Θ & ϕ_{1}^{2} + 4 ϕ_{2} < 0 \end{cases} \end{array}$

for [latex]i = 1, \, 2, \, \ldots[/latex], where B₁ and B₂ are the roots of [latex]\phi(B) = 1 - \phi_1 B - \phi_2 B ^ 2[/latex], [latex]{G_1 = B_1^{-1}}[/latex], [latex]{G_2 = B_2^{-1}}[/latex], [latex]R = \sqrt{- \phi_2}[/latex], and [latex]\cos \Theta = \phi_1 / (2R)[/latex]. Third, the coefficients can be calculated by using the factored form of the characteristic polynomial, and writing the model in terms of X_t and equating coefficients. Fourth, the coefficients can be calculated by using the ARMAtoMA function in R. Calculate the first eight coefficients of the MA(∞) model, [latex]\theta_1, \, \theta_2, \, \ldots , \, \theta_8[/latex], using these four methods for the following sets of AR(2) parameters:
1. [latex]\phi_1 = 1, \, \phi_2 = -1 / 4[/latex],
2. [latex]\phi_1 = 1 / 2, \, \phi_2 = 1 / 9[/latex],
3. [latex]\phi_1 = 1, \, \phi_2 = -1 / 2[/latex].
These three parameter combinations correspond to one real root with multiplicity two, two distinct real roots, and two complex roots of the characteristic equation [latex]\phi(B) = 0[/latex].
9.14 For an AR(2) time series model, the asymptotic variance–covariance matrix of the maximum likelihood estimates [latex]\hat \phi_1[/latex] and [latex]\hat \phi_2[/latex] is

$\begin{array}{l} \frac{1}{n} [\begin{array}{cc} 1 - ϕ_{2}^{2} & - ϕ_{1} (1 + ϕ_{2}) \\ - ϕ_{1} (1 + ϕ_{2}) & 1 - ϕ_{2}^{2} \end{array}] . \end{array}$

What is the asymptotic population correlation between [latex]\hat \phi_1[/latex] and [latex]\hat \phi_2[/latex]?
9.15 Consider an AR(2) time series model with [latex]\phi_1 = 1[/latex], [latex]\phi_2 = -1 / 2[/latex], and [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. For a realization of [latex]n = 100[/latex] observations [latex]X_1, \, X_2, \, \ldots, \, X_{100}[/latex] from this AR(2) model, give convincing numerical evidence that the forecasted value for X₁₀₃ is unbiased and that the 95% prediction interval for X₁₀₃ is exact.
9.16 Implement Theorem 9.17 on the R built-in LakeHuron time series to calculate the first five forecasted values and associated prediction intervals. Do not just use the predict function.
9.17 Consider a standard AR(2) model for an observed time series of [latex]n = 100[/latex] values. The last two values in the time series are [latex]x_{99} = 3[/latex] and [latex]x_{100} = 4[/latex]. The estimated coefficients in the AR(2) model are [latex]\hat \phi_1 = 1[/latex] and [latex]\hat \phi_2 = -0.5[/latex]. Compute the next ten forecasted values [latex]\hat X_{101}, \, \hat X_{102}, \, \ldots, \, \hat X_{110}[/latex] and comment on the shape of the forecasted values.
9.18 Consider a realization [latex]x_1, \, x_2, \, \ldots, \, x_n[/latex] of a stationary shifted AR(2) time series model with fixed known parameters μ, [latex]\phi_1[/latex], [latex]\phi_2[/latex], and [latex]\sigma _ Z ^ {\, 2}[/latex]. Write a formula for [latex]\hat X_{n+3}[/latex] in terms of [latex]x_{n - 1}[/latex] and x_n.
9.19 Consider the annual Lake Huron water level heights from 1875 to 1972 given in the R built-in data set LakeHuron, appended by the next ten observations,

$\begin{array}{l} 580.98, 581.04, 580.49, 580.52, 578.57, 578.96, 579.94, 579.77, 579.44, 578.97, \end{array}$

for the years 1973 to 1982. Give the p-value associated with a test of the statistical significance of the slope of the simple linear regression line for the augmented time series.
9.20 Consider the AR(3) model with coefficients

$\begin{array}{l} ϕ_{1} = 3 / 2 ϕ_{2} = - 1 ϕ_{3} = 1 / 4. \end{array}$
1. Is this model invertible?
2. Is this model stationary?
3. Calculate the first six coefficients of the associated MA(∞) model.
9.21 Two necessary, but not sufficient, conditions for stationarity of an AR(p) time series model are (Cryer, J.D. and Chan, K–S, Time Series Analysis: With Applications in R, 2008, Springer, page 76):

$\begin{array}{l} ϕ_{1} + ϕ_{2} + \dots + ϕ_{p} < 1 and | ϕ_{p} | < 1. \end{array}$
1. Show that these conditions hold for the stationary AR(4) time series model with
  $\begin{array}{l} ϕ_{1} = \frac{21}{20}, ϕ_{2} = \frac{1}{20}, ϕ_{3} = - \frac{23}{40}, ϕ_{4} = \frac{3}{10} . \end{array}$
2. Graphically or algebraically, show that these conditions are necessary but not sufficient for falling in the triangular-shaped stationary region from Theorem 9.9 for an AR(2) time series model.
9.22 Consider the AR(4) time series model with characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - \frac{21}{20} B - \frac{1}{20} B^{2} + \frac{23}{40} B^{3} - \frac{3}{10} B^{4} \end{array}$

and Gaussian white noise with population variance [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. Conduct a Monte Carlo simulation experiment that provides convincing numerical evidence that [latex]\gamma(0) = 3520 / 819[/latex].
9.23 The R vector phi contains the parameters [latex]\phi_1, \, \phi_2, \, \ldots , \, \phi_p[/latex] in an AR(p) model. Write an R function named is.stationary with a single parameter phi that returns TRUE if the AR(p) model is stationary and FALSE otherwise.
9.24 The R code below takes initial p autocovariances [latex]\gamma(0), \, \gamma(1), \, \ldots , \, \gamma(p - 1)[/latex] for an AR(p) model, which are stored in the vector gam, and places them in a variance–covariance matrix GAMMA (denoted by Γ in the text).

The code makes this conversion by using two nested for loops. Heather can do this calculation without using for loops. How does she do it?
9.25 Consider a time series that is governed by an AR(4) model with characteristic polynomial

$\begin{array}{l} ϕ (B) = 1 - \frac{21}{20} B - \frac{1}{20} B^{2} + \frac{23}{40} B^{3} - \frac{3}{10} B^{4} \end{array}$

and Gaussian white noise with population variance [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. Conduct a Monte Carlo simulation experiment that provides convincing numerical evidence that the 95% confidence interval for [latex]\phi_3[/latex] based on the maximum likelihood estimators for an AR(4) time series model is asymptotically exact.
9.26 For logarithms of the [latex]n = 55[/latex] annual lynx pelt sales time series from Example 9.29, find the values of p and q associated with the ARMA(p, q) model that minimizes the AIC statistic. Assume that the models are fitted by maximum likelihood.
9.27 Fit the AR(4) model to the logarithms of the [latex]n = 55[/latex] annual lynx pelt sales time series from Example 9.29 by maximum likelihood. Simulate the fitted model to generate [latex]n = 55[/latex] random annual lynx pelt sales from the fitted model. View a dozen or so such realizations and comment on your faith in the fitted AR(4) time series model. Repeat the experiment for a fitted ARMA(2, 3) time series model and comment.
9.28 Show that the MA(1) model

$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1}, \end{array}$

and the MA(1) model

$\begin{array}{l} X_{t} = Z_{t} + \frac{1}{θ} Z_{t - 1} \end{array}$

have the same population autocorrelation function.
9.29 Show that [latex]-1/2 \le \rho(1) \le 1/2[/latex] for an MA(1) model.
9.30 Derive the population autocorrelation function for the MA(1) model with arbitrary mean value μ given by

$\begin{array}{l} X_{t} = μ + Z_{t} + θ Z_{t - 1}, \end{array}$

in a similar fashion to the derivation for the standard MA(1) model.
9.31 Conduct the following Monte Carlo simulation experiment. Generate [latex]n = 100[/latex] observations from an MA(1) time series model with [latex]\theta = 0.9[/latex] and standard normal white noise terms. Estimate the expected value and standard deviation of r₁ and r₂. Run enough replications to that you can report your estimates to two significant digits.
9.32 Consider an MA(1) model with [latex]\theta = -0.9[/latex] and Gaussian white noise with [latex]\sigma _ Z ^ {\, 2} = 1[/latex]. Generate a dozen realizations of this time series for [latex]n = 100[/latex] observations each. Plot the time series and the associated correlogram, using a call to Sys.sleep between each realization to view the graphs. Write a paragraph that describes what you observe in the dozen realizations.
9.33 Consider an MA(1) time series model

$\begin{array}{l} X_{t} = Z_{t} + θ Z_{t - 1}, \end{array}$

where [latex]\left\{ Z_t \right\}[/latex] denotes Gaussian white noise. Let [latex]\hat \theta _ {\scriptscriptstyle{MOM}}[/latex] be the method of moments estimator of θ and let [latex]\hat \theta _ {\scriptscriptstyle{MLE}}[/latex] be the maximum likelihood estimator of θ. One way to compare these two estimators is the asymptotic relative efficiency, defined as

$\begin{array}{l} lim_{n \to \infty} \frac{V [{\hat{θ}}_{MOM}]}{V [{\hat{θ}}_{MLE}]} . \end{array}$

Brockwell and Davis (2016, page 129) give the population variance of [latex]\hat \theta _ {\scriptscriptstyle{MOM}}[/latex] and [latex]\hat \theta _ {\scriptscriptstyle{MLE}}[/latex] as approximately

$\begin{array}{l} V [{\hat{θ}}_{MOM}] ≅ \frac{1 + θ^{2} + 4 θ^{4} + θ^{6} + θ^{8}}{n {(1 + θ^{2})}^{2}} and V [{\hat{θ}}_{MLE}] ≅ \frac{1 - θ^{2}}{n} . \end{array}$

Write a Monte Carlo simulation that confirms these two formulas for [latex]n = 400[/latex], [latex]\theta = 1 / 2[/latex], and [latex]\sigma _ Z ^ {\, 2} = 1[/latex].

9.34 The [latex]n = 45[/latex] daily average number of defects per truck at the final inspection at a manufacturing facility (from Burr, 1976, Statistical Quality Control Methods, Marcel Dekker, New York), read row-wise, are given below.

1.20	1.50	1.54	2.70	1.95	2.40	3.44	2.83	1.76
2.00	2.09	1.89	1.80	1.25	1.58	2.25	2.50	2.05
1.46	1.54	1.42	1.57	1.40	1.51	1.08	1.27	1.18
1.39	1.42	2.08	1.85	1.82	2.07	2.32	1.23	2.91
1.77	1.61	1.25	1.15	1.37	1.79	1.68	1.78	1.84

Fit these data values to a shifted MA(1) time series model by the method of moments, least squares, and maximum likelihood estimation.

9.35 The formula for the population variance of the sample mean for a stationary time series model (which was proved in Section 8.2.1) is

$\begin{array}{l} V [\bar{X}] = \frac{σ_{X}^{2}}{n} [1 + 2 \sum_{k = 1}^{n - 1} (1 - \frac{k}{n}) ρ (k)] \end{array}$

Show that this is approximately

$\begin{array}{l} V [\bar{X}] ≅ \frac{σ_{X}^{2}}{n} [1 + 2 \sum_{k = 1}^{\infty} ρ (k)] \end{array}$

or, equivalently,

$\begin{array}{l} V [\bar{X}] ≅ \frac{σ_{X}^{2}}{n} [\sum_{k = - \infty}^{\infty} ρ (k)] \end{array}$

for large values of n whenever the autocorrelation function values decay rapidly enough with increasing k such that

$\begin{array}{l} \sum_{k = 1}^{\infty} | ρ (k) | < \infty . \end{array}$

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Statistical Modeling: Regression, Survival Analysis, and Time Series Analysis Copyright © 2023 by Lawrence M. Leemis is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Chapter 9 Topics in Time Series Analysis

9.1 Autoregressive Models

9.1.1 The AR(1) Model

9.1.2 The AR(2) Model

9.1.3 The AR(p) Model

9.1.4 Computing

9.2 Moving Average Models

9.2.1 The MA(1) Model

9.2.2 The MA(2) Model

9.2.3 The MA(q) Model

9.3 ARMA([latex]p, \, q[/latex]) Models

9.4 Nonstationary Models

9.4.1 Removing Trends Via Regression

9.4.2 ARIMA(p, d, q) Models

9.5 Spectral Analysis

9.5.1 The Spectral Density Function

9.5.2 The Periodogram

9.6 Exercises

License

Share This Book