gredi.recherche.usherbrooke.cagredi.recherche.usherbrooke.ca/wpapers/GREDI-0705.pdf · Sieve Bootstrap Unit Root Tests Patrick Richard GREDI and D´epartment de sciences ´economiques

Groupe de Recherche en Economie et Developpement International

Cahier de recherche / Working Paper07-05

Sieve bootstrap unit root tests

Patrick Richard

Sieve Bootstrap Unit Root Tests

Patrick RichardGREDI and Department de sciences economiquesFaculte d’administration, Universite de Sherbrooke

Sherbrooke, Quebec, [email protected]

Abstract

We study the use of the sieve bootstrap to conduct ADF unit root tests when thetime series’ first difference is a general linear process that admits an infinite movingaverage form. The work of Park (2002) and Chang and Park (2003) suggest that theusual autoregressive (AR) sieve bootstrap provides some accuracy gains under thenull hypothesis. The magnitude of this amelioration, however, depends on the natureof the true DGP. For example, the AR sieve test over-rejects almost as much as theasymptotic one when the DGP contains a strong negative moving average root. Thislack of robustness is, of course, caused by the poor quality of the AR approximation.

We attempt to reduce this problem by proposing to use sieve bootstraps based onmoving average (MA) and autoregressive-moving average (ARMA) approximations.Though this is a natural generalisation of the standard AR sieve bootstrap, it hasnever been suggested in the econometrics literature. Two important theoretical resultsare shown. First, we establish invariance principles for the partial sum processesbuilt from invertible MA and stationary and invertible ARMA sieve bootstrap DGPs.Second, these are used to provide a proof of the asymptotic validity of the resultingADF bootstrap tests.

Through Monte Carlo experiments, we find that the rejection probability of theMA sieve bootstrap is more robust to the DGP than that of the AR sieve bootstrap.We also find that the ARMA sieve bootstrap requires only a very parsimonious spec-ification to achieve excellent results.

I thank Russell Davidson, John Galbraith, Nikolay Gospodinov, Alex Maynard, Victoria Zinde-Walsh

and seminar participants at the universities of Carleton, Sherbrooke and St-Lawrence as well as at the CEA

2006 conference for their comments on earlier versions of this paper

March 2007

1 Introduction

It is well known that the Augmented Dickey-Fuller (ADF) test sometimes severelyover-rejects the unit root null hypothesis. The bootstrap is often used as a potentialsolution to such error in rejection probability (ERP) 1 problems. As Park (2003)shows, the standard iid bootstrap provides asymptotic refinements over the asymp-totic ADF test whenever the correlation structure of the first difference process isfinite and of a known AR(p) form. When the original data has an infinite correlationstructure, however, the iid bootstrap is not appropriate and Park’s results no longerapply. Unfortunately, it is precisely in such cases that the ERP problem is at itsworst.

Bootstrap methods for infinitely correlated data may be classified within two cat-egories: block bootstrap methods and sieve bootstrap methods. Both attempt togenerate bootstrap data with properties as close as possible to those of the originaldata. The block bootstrap consists of randomly resampling blocks of observationsrather than single observations, thus preserving the true correlation structure withineach block. This effectively eliminates the need for parametric modelling, so that theblock bootstrap is considered as a non-parametric method. This advantage is par-tially offsetted by two problems arising from the fact that the correlation structure isnot preserved between any two blocks (the join point problem) and that the numberof blocks is not as large as the number of observations, which increases the variabil-ity of the moments of the bootstrap data relative to those of the original sample.These problems have been studied by several authors. See, among others, Andrews(2004) and Paparoditis and Politis (2001 a,b) for the join point problem and Hall andHorowitz (1996) and Goncalves and Vogelsang (2006) for the variability of momentsproblem.

The sieve bootstrap, formally introduced by Buhlmann (1997, 1998) and Kreiss(1992), uses finite order parametric models to approximate the infinite correlationstructure of the data. Bootstrap samples are then created by iid resampling fromthe residuals of these models. This avoids the problems associated with the blockbootstrap but has the obvious disadvantage that a finite order approximation cannotproperly model an infinitely correlated process. However, if one allows the approxima-tion order to increase with the sample size, this problem disappears asymptotically.A more severe problem is the fact that the sieve bootstrap is not valid wheneverthe DGP is nonlinear or has a GARCH structure. Finally, the choice of a particularsieve model may affect the quality of the resulting inference. To this date, the sievebootstrap has only been based on autoregressive approximations.

1ERP is defined as the difference between the probability of rejecting a true null hypothesis and the nominal levelof the test.

2

Although both methods provide asymptotic refinements in stationary time seriescases, neither has been shown to do so in the unit root case. There is neverthelesssome simulation-based evidence to the effect that block and sieve bootstrap ADFtests may sometimes be more precise than asymptotic ones, see Palm, Smeekes andUrbain (2006). On the other hand, it has been shown that sieve bootstrap DF andADF tests are asymptotically valid (Psaradakis, 2001, Park, 2002 and Chang andPark, 2003) while some versions of the block bootstrap also provide consistent ADFtests (Paparoditis and Politis, 2003 and 2005, Parker, Paparoditis and Politis, 2006and Swensen, 2003).

In this paper, we introduce sieve bootstraps based on moving average (MA) and au-toregressive moving average (ARMA) approximations and apply them to ADF tests.We provide some asymptotic results that justify their use and illustrate their finitesample performance with a Monte Carlo study. The rest of the paper is organised asfollows. In section 2, we provide invariance principles for partial sum processes builtwith sieve bootstrap data. This is then used to show that ADF tests based on eitherthe MA or ARMA sieves are asymptotically valid. Section 3, which is rather lenghty,studies the finite sample performances of these tests and compares them to the moreusual AR sieve ADF tests. Section 4 concludes.

2 Invariance Principle and Validity of Sieve Bootstrap ADFTests

In this section, we derive the invariance principles necessary to justify the use of MAsieve bootstrap (MASB) and ARMA sieve bootstrap (ARMASB) samples to carryout bootstrap unit root tests. These results are often referred to as functional centrallimit theorems because they extend standard asymptotic distribution theory, whichis applicable to simple random variables, to more complex mathematical objects suchas random functions. For a very accessible introduction to these topics, see David-son (2006). We also show that ADF tests based on MASB and ARMASB samplesare asymptotically valid, that is, that their asymptotic distribution is the standardDickey-Fuller. Although these results are derived for unit root models without a con-stant or a deterministic trend, the proofs could easily be modified to allow for suchdeterministic parts.

Establishing invariance principles for sieve bootstrap procedures is a relatively newstrand in the literature and, at the time of writing this, and to the author’s knowl-edge, only 2 attempts have been made. First, Bickel and Buhlmann (1999) derive abootstrap functional central limit theorem under a bracketing condition for the ARsieve bootstrap (ARSB). Second, Park (2002) derives an invariance principle for the

3

ARSB in the case of a general MA(∞) linear process. This latter approach is moreinteresting for our present purpose because Park (2002) establishes the convergenceof the bootstrap partial sum process to the standard Brownian motion, and most ofthe unit-root-tests asymptotic theory is based on such processes. Further, as we willsee below, his assumptions are standard ones in time series econometrics.

We then use these results to show that the ADF bootstrap test based on the MASBand ARMASB is asymptotically valid. A bootstrap test is said to be asymptoticallyvalid, or consistent, if it can be shown that its large sample distribution under thenull is the test’s asymptotic distribution. Consequently, we will seek to prove thatMASB and ARMASB ADF tests statistics follow the DF distribution asymptoticallyunder the null.

2.1 General bootstrap invariance principle

Let εtnt=1 be a sequence of iid random variables with finite second moment σ2.

Consider a sample of size n and define the partial sum process:

Wn(t) =1

σ√n

[nt]∑

k=1

εk

where [y] denotes the largest integer smaller or equal to y and t is an index suchthat j−1

n≤ t < j

n, where j = 1, 2, ...n is another index that allows us to divide the

[0,1] interval into n + 1 parts. Thus, Wn(t) is a step function that converges to arandom walk as n → ∞. Also, as n → ∞, Wn(t) becomes infinitely dense on the[0, 1] interval. By the classical Donsker’s theorem, we know that

Wnd→ W.

where W is the standard Brownian motion. The Skorohod representation theoremtells us that there exists a probability space (Ω,F ,P), where Ω is the space containingall possible outcomes, F is a σ-field and P a probability measure, that supports Wand a process W′

n such that W′n has the same distribution as Wn and

W′n

a.s.→ W. (1)

Indeed, as demonstrated by Sakhanenko (1980), W′n can be chosen so that

Pr‖W′n − W‖ ≥ δ ≤ n1−r/2KrE|εt|r (2)

for any δ > 0 and r > 2 such that E|εt|r < ∞ and where Kr is a constant thatdepends on r only. The result ( 2) is often referred to as the strong approximation.Because the invariance principle we seek to establish is a distributional result, we do

4

not need to distinguish Wn from W′n. Consequently, because of equations ( 1) and

( 2), we say that Wna.s.→ W, which is stronger than the convergence in distribution

implied by Donsker’s theorem.

Now, suppose that we can obtain an estimate of εtnt=1, which we will denote as

εtnt=1, from which we can draw bootstrap samples of size n, denoted as ε?

tnt=1. If

we suppose that n→ ∞, then we can build a bootstrap probability space (Ω?,F?,P?)which is conditional on the realization of the set of residuals εt∞t=1 from which thebootstrap random variables are drawn. What this means is that each bootstrap draw-ing ε?

tnt=1 can be seen as a realization of a random variable defined on (Ω?,F?,P?).

In all that follows, the expectation with respect to this space (that is, with re-spect to the probability measure P?) will be denoted by E?. For example, if thebootstrap samples are drawn from (εt − εn)n

t=1, then E?ε?t = 0 and E?ε?

t2 = σ2

n =(1/n)

∑nt=1 (εt − εn)2. Of course, whenever the εts are residuals from a linear regres-

sion model with a constant, εn = 0 so that E?ε?t2 = σ2

n = (1/n)∑n

t=1 ε2t . Also,

d?

→,p?

→and

a.s.?→ will be used to denote convergence in distribution, in probability and almostsure of the functionals of the bootstrap samples defined on (Ω?,F?,P?). Further,following Park (2002), for any sequence of bootstrapped statistics X?

n we say that

X?n

d?

→ X a.s. if the conditional distribution of X?n weakly converges to that of X

a.s on all sets of εt∞t=1. In other words, if the bootstrap convergence in distribution

(d?

→) of functionals of bootstrap samples on (Ω?,F?,P?) happens almost surely for all

realizations of εt∞t=1, then we writed?

→ a.s.

Let ε?tn

t=1 be a realization from a bootstrap probability space. Define

W?n(t) =

1

σn

√n

[nt]∑

k=1

ε?k.

Once again, by Skorohod’s theorem, there exists a probability space on which aBrownian motion W? is supported and on which there also exists a process W?

n′

which has the same distribution as W?n and such that

Pr?‖W?n′ − W?‖ ≥ δ ≤ n1−r/2KrE

?|ε?t |r (3)

for δ, r and Kr defined as before. Because W?n′ and W?

n are distributionally equivalent,we will not distinguish them in all that follows. Equation ( 3) allows us to state thefollowing theorem, which is also theorem 2.2 in Park (2002)

5

Theorem (Park 2002, theorem 2.2, p. 473) If E?|ε?t |r <∞ a.s. and

n1−r/2E?|ε?t |r

a.s.→ 0 (4)

for some r > 2, then W?n

d?

→ W a.s. as n→ ∞.

This result comes from the fact that, if condition ( 4) holds, then equation ( 3) im-plies convergence in probability over the bootstrap probability space which, as usual,

implies convergence in distribution that is, W?n

d?

→ W? a.s. Since the distribution of

W? is independent of the set of residuals εt∞t=1, we can equivalently say W?n

d?

→ Wa.s. Hence, whenever condition ( 4) is met, the invariance principle follows. In otherwords, Skorohod implies that there exists a process W?

n’ distributionally equivalentto W?

n and Sakhanenko implies that it can be chosen so as to satisfy equation ( 3).Of course, this theorem is only valid for bootstrap samples drawn from sets of i.i.d.random variables. Nevertheless, Park (2002) uses it to prove an invariance principlefor the AR sieve bootstrap. In the next two sections, we do essentially the same thingfor the MA and ARMA sieve bootstraps.

2.2 Invariance principle for MA sieve bootstrap

Let us consider a general linear process:

ut = π(L)εt (5)

where

π(z) =∞∑

k=0

πkzk

and the εts are iid random variables. Moreover, let π(z) and εt satisfy the followingassumptions:

Assumption 2.1.

(a) The εt are iid random variables such that E(εt)=0 and E(|εt|r)< ∞ for somer > 4.

(b) π(z) 6= 0 for all |z| ≤ 1 and∑∞

k=0 |k|s|πk| <∞ for some s ≥ 1.

These are usual assumptions in stationary time series analysis. Notice that (a)along with the coefficient summability condition insures that the process is weaklystationary. On the other hand, the assumption that π(z) 6= 0 for all |z| ≤ 1 is

6

necessary for the process to have an AR(∞) form. See Chang and Park (2003) for adiscussion of these assumptions.

The MASB consists of approximating equation ( 5) by a finite order MA(q) model:

ut = π1εq,t−1 + π2εq,t−2 + ... + πqεq,t−q + εq,t (6)

where q is a function of the sample size. We believe that the analytical indirectinference estimator for MA models introduced by Galbraith and Zinde-Walsh (1994),henceforth GZW (1994), is the most appropriate one for this task. There are severalreasons for this. The first is computation speed. Consider that in practice, oneoften uses information criteria such as the AIC and BIC to choose the order of theMA sieve model. These criteria make use of the value of the loglikelihood at theestimated parameters, which implies that, if we want q to be within a certain range,say q1 ≤ q ≤ q2, then we must estimate q2 − q1 models. With maximum likelihood,this requires us to maximize the log likelihood q2 − q1 times. With GZW (1994)’smethod, we need only estimate one model, namely an AR(f), from which we candeduce all at once the parameters of all the q2 − q1 MA(q) models. We then onlyneed to evaluate the loglikelihood function at these parameter values and choose thebest model accordingly. This is obviously much faster then maximum likelihood.

Second, the simulations of GZW (1994) indicate that their estimator is more ro-bust to changes in q. For example, suppose that the true model is MA(∞) and thatwe consider approximating it by either a MA(q) or a MA(q+1) model. If we usethe GZW (1994) method, for fixed f , going from a MA(q) to a MA(q+1) specifica-tion does not alter the values of the first q coefficients. On the other hand, these qestimates are likely to change very much if the two models are estimated by max-imum likelihood, because this latter method strongly depends on the specification.Therefore, bootstrap samples generated from parameters estimated using the GZWestimator are likely to be more robust to the choice of q than samples generated usingmaximum likelihood estimates.

Another reason to prefer GZW (1994)’s estimator is that, according to their sim-ulations, it tends to yield fewer non-invertible roots, which are not at all desirablehere. Finally, it allows us to determine, through simulations, which sieve bootstrapmethod yields more precise inference for a given quantity of information (that is, fora given lag length).

Approximating an infinite linear process by a finite dimension model is an oldtopic in econometrics. Most of the time, finite f-order autoregressions are used, withf increasing as a function of the sample size. The classical reference on the subjectis Berk (1974) who proposes to increase f so that f 3/n → 0 as n → ∞ (that is,f = o(n1/3)). This assumption is quite restrictive because it does not allow f toincrease at the logarithmic rate, which is what happens if we use AIC or BIC. Here,

7

we make the following assumption about q and f :

Assumption 2.2.

Let q and f be, respectively, the orders of the approximating MA sieve model andof the AR model used to estimate it via analytical indirect inference. Then, we

assume that q → ∞ and f → ∞ as n → ∞ and q = o(

(n/log (n))1/2)

and f =

o(

(n/log (n))1/2)

with f > q.

The reason for this choice is closely related to lemma 3.1 in Park (2002) and thereader is referred to the discussion following it. Here, we limit ourselves to pointingout that this rate is consistent with both AIC and BIC, which are commonly usedin practice. The restriction that f > q is necessary for the computation of the GZW(1994) estimator.

The bootstrap samples are generated from the DGP:

u?t = πq,1ε

?t−1 + ...+ πq,qε

?t−q + ε?

t . (7)

where the πq,i, i = 1, 2, ...q are estimates of the true parameters πi, i = 1, 2, ... and theε?

t are drawn from the EDF of (εt−εn), that is, from the EDF of the centered residualsof the MA(q) sieve. We will now establish an invariance principle for the partial sumprocess of u?

t by considering its Beveridge-Nelson decomposition and showing that itconverges almost surely to the same limit as the corresponding partial sum processbuilt with the original ut. First, consider the decomposition of ut:

ut = π(1)εt + (ut−1 − ut)

where

ut =∞∑

k=0

πkεt−k

and

πk =∞∑

i=k+1

πi.

Now, consider the partial sum process

Vn(t) =1√n

[nt]∑

k=1

uk

=1√n

[nt]∑

k=1

π(1)εt +1√n

[nt]∑

k=1

∞∑

k=0

∞∑

i=k+1

πi

(εt−k−1 − εt−k)

hence,

Vn(t) = (σπ(1))Wn(t) +1√n

(u0 − u[nt]).

8

Under assumption 2.1, Phillips and Solo (1992) show that

max1≤k≤n

|n−1/2uk| p→ 0.

Therefore, applying the continuous mapping theorem, we have

Vn(t)d→ V = (σπ(1))W

On the other hand, from equation ( 7), we see that u?t can be decomposed as

u?t = π(1)ε?

t + (u?t−1 − u?

t )

where

π(1) = 1 +q∑

k=1

πq,k

u?t =

q∑

k=1

˜πkε?t−k+1

˜πk =q∑

i=k

πq,i

It therefore follows that we can write:

V?n(t) =

1√n

[nt]∑

k=1

u?t = (σnπn(1))W?

n +1√n

(u?0 − u?

[nt])

In order to establish the invariance principle, we must show that V?n

d?

→ V =(σπ(1))W a.s. To do this, we need 3 lemmas. The first one shows that σn and πn(1)

converge almost surely to σ and π(1). The second demonstrates that W?n(t)

d?

→ Wa.s. Finally, the last one shows that

Pr?max1≤t≤n

|n−1/2u?t | > δ a.s.→ 0 (8)

for all δ > 0, which is equivalent to saying that

max1≤k≤n

|n−1/2uk| p?

→ 0. a.s.

and is therefore the bootstrap equivalent of the result of Philips and Solo (1992).These 3 lemmas are closely related to the results of Park (2002) and their counterpartsin this paper are identified for reference.

9

Lemma 2.1 (Park 2002, lemma 3.1, p. 476)

Let assumptions 2.1 and 2.2 hold. Then,

max1≤k≤q

|πq,k − πk| = o(1) a.s. (9)

for large nσ2

n = σ2 + o(1) a.s. (10)

πn(1) = π(1) + o(1) a.s. (11)

Proof: see the appendix.

Lemma 2.2 (Park 2002, lemma 3.2, p. 477).

Let assumptions 2.1 and 2.2 hold. Then, E?|ε?t |r <∞ a.s. and n1−r/2E?|ε?

t |ra.s.→ 0


Lemma 2.2 proves that W?n(t)

d?

→ W a.s. because it shows that condition ( 4) holdsalmost surely.

Lemma 2.3 (Park 2002, theorem 3.3, p. 478).

Let assumptions 2.1 and 2.2 hold. Then, equation ( 8) holds.


With these 3 lemmas, the MA sieve bootstrap invariance principle is established.It is formalized in the next theorem.

Theorem 2.1.

Let assumptions 2.1 and 2.2 hold. Then by lemmas 2.1, 2.2 and 2.3,

V?n

d?

→ V = (σπ(1))W a.s.

2.3 Invariance principle for ARMA sieve bootstrap

We now establish an invariance principle for the ARMASB. It turns out to be a simplematter of combining the results of the previous section to those of Park (2002). The

10

ARMASB procedure consists of approximating the above general linear process ( 5)by a finite order ARMA(p,q) model:

ut = α1ut−1 + ... + αput−p + π1ε`,t−1 + ...+ πqε`,t−q + ε`,t (12)

where ` = p+q denotes the total number of parameters and is, of course, a function ofthe sample size. As before, and for similar reasons, we propose that the parameters beestimated using an analytical indirect inference method suitable for ARMA models.Such a method has been proposed by Galbraith and Zinde-Walsh (1997). Hence, inaddition to p and q, we must also specify the order of the approximating autoregressionfrom which the ARMA parameter estimates are deduced. As before, we let f denotethis order. Then, we make the following assumptions:

Assumption 2.3

Both ` and f go to infinity at the rate o(

(n/log(n))1/2)

and f > `.

Notice that we do not require that both p and q go to infinity simultaneously. Rather,we require that their sum does. Thus, the results that follow hold even if p or q is heldfixed while the sample size increases, as long as the sum increases at the proper rate.As before, the restriction that f > ` is required for the Galbraith and Zinde-Walsh(1997) estimator to be consistent. The bootstrap samples are generated from theDGP:

u?t = α`,1u

?t−1 + ... + α`,pu

?t−p + π`,1ε

?t−1 + ...+ π`,qε

?t−q + ε?

t (13)

where the α`,k and π`,k can be combined using analytical indirect inference equationsto form consistent estimates of the true parameters of either the infinite order ARor MA representation of u?

t and the ε?t are drawn from the EDF of (εt − εn)

nt=1, that

is, from the empirical distribution of the centered residuals of the fitted ARMA(p,q).Next, we need to build a partial sum process V?

n of u?t . The easiest way to do this

is to consider either the AR(∞) or the MA(∞) form of the ARMA(p,q) model andbuild V?

n based on this representation. Let us consider the MA(∞) form of u?t , which

we define asu?

t = θ`,1ε?t−1 + θ`,2ε

?t−2 + ...+ ε?

t (14)

where θ`,1 = π`,1 + α`,1, θ`,2 = θ`,1α`,1 − α`,2 − π`,2 and so forth. Then,

V?n(t) =

1√n

[nt]∑

k=1

u?t = (σnθn(1))W?

n +1√n

(u?0 − u?

[nt]) (15)

where

u?t =

∞∑

k=0

θkε?t−k

11

and

θk =∞∑

i=k+1

θi.

and where σn is the estimated variance of the residuals of the ARMA(p,q) sieve. Then,

we need to show that V?n(t)

d?

→V a.s. This, as before, can be done through proving 3results, which are simple corollaries of lemmas 2.1, 2.2 and 2.3 of the present paperand lemmas 3.1 and 3.2 as well as theorem 3.3 of Park (2002). Recall that πk denotesthe kth parameter of the true MA(∞) form of the process ut.

Corollary 2.1.

Under assumptions 2.1 and 2.3,

max1≤k≤`

∣

∣

∣θ`,k − πk

∣

∣

∣ = o (1) a.s. (16)

for large n. Also,σ2

n = σ2 + o(1) a.s. (17)

θn(1) = π(1) + o(1) a.s. (18)

Proof: see appendix.

Corollary 2.2.

Under assumptions 2.1 and 2.3, the ARMA sieve bootstrap errors’ partial sumprocess converges in distribution to the standard Wiener process almost surely overall bootstrap samples as n goes to infinity:

W?n

d?

→ W a.s.


Corollary 2.3

Under assumptions 2.1 and 2.3,

Pr?max1≤t≤n

|n−1/2u?t | > δ a.s.→ 0

for all δ > 0 and u?t generated from the ARMA(p,q) sieve bootstrap DGP.


12

These three results are sufficient to prove the invariance principle of the ARMAsieve bootstrap partial sum process. This is formalized in the next theorem.

Theorem 2.2 Let assumptions 2.1, 2.2 and 2.3 hold. Then by corollaries 2.1, 2.2and 2.3,

V?n

d?

→ V = (σπ(1))W a.s.

2.4 Asymptotic Validity of MASB and ARMASB ADF tests

Consider a time series yt with the following DGP:

yt = αyt−1 + ut (19)

where ut is the general linear process described in equation ( 5). We want to test theunit root hypothesis against the stationarity alternative (that is, H0 : α = 1 againstH1 : α < 1). This test is frequently conducted as a t-test in the so called ADFregression, first proposed by Said and Dickey (1979, 1981):

yt = αyt−1 +p∑

k=1

αk∆yt−k + ep,t (20)

where p is chosen as a function of the sample size. A large literature has beendevoted to selecting p, see for example, Ng and Perron (1995, 2001). As noted inthe introduction, deterministic parts such as a constant and a time trend are usuallyadded to the regressors of ( 20). Chang and Park (CP) (2003) have shown that thetest based on this regression asymptotically follows the DF distribution when H0 istrue under very weak conditions, including assumptions 2.1 and 2.2. Let y?

t denotethe bootstrap process generated by the following DGP:

y?t =

t∑

k=1

u?k

and the u?k = ∆y?

t are generated by some sieve bootstrap DGP. The bootstrap ADFregression equivalent to regression ( 20) is

y?t = αy?

t−1 +p∑

k=1

αk∆y?t−k + et. (21)

Let us suppose for a moment that ∆y?t has been generated by an AR(p) sieve

bootstrap DGP:

∆y?t =

p∑

k=1

αp,k∆y?t−k + ε?

t

13

Then, letting α = 1 in ( 21), we see that the true parameters of this equation arethe αp,ks and that its errors are identical to the errors driving the bootstrap DGP.This is a convenient fact which CP (2003) use to prove the consistency of the ARSBADF test based on this regression. If however the y?

t are generated by the MA(q)or ARMA(p,q) sieves described above, then the errors of regression ( 21) are notidentical to the bootstrap errors under the null because the AR(p) approximationcaptures only a part of the correlation structure present in the MA(q) or ARMA(p,q)process. It is nevertheless possible to show that they will be equivalent asymptotically,that is, that ε?

t = et+o(1) a.s. This is done in lemma A1, which can be found in theappendix.

Let x?p,t = (∆y?

t−1,∆y?t−2, ...,∆y

?t−p) and define:

A?n =

n∑

t=1

y?t−1ε

?t −

(

n∑

t=1

y?t−1x

?p,t

>

)(

n∑

t=1

x?p,tx

?p,t

>

)−1 ( n∑

t=1

x?p,tε

?t

)

B?n =

n∑

t=1

y?t−1

2 −(

n∑

t=1

y?t−1x

?p,t

>

)(

n∑

t=1

x?p,tx

?p,t

>

)−1 ( n∑

t=1

x?p,ty

?t−1

)

Notice that A?n is defined as a function of ε?

t , not et. Then, it is easy to see that thet-statistic computed from regression ( 21) can be written as:

τ ?n =

α?n − 1

s(α?n)

+ o(1) a.s.

for large n and where α?n − 1 = A?

nB?n−1 and s(α?

n)2 = σ2nB?

n−1.

The equality is asymptotic and almost surely holds because the residuals of the ADFregression are asymptotically equal to the bootstrap errors, as shown in lemma A1.This also justifies the use of the estimated variance σ2

n. Note that in small samples,it may be preferable to use the estimated variance of the residuals from the ADFregression, which is indeed what we do in the simulations. We must now address theissue of how fast p is to increase. For the ADF regression, Said and Dickey (1984)require that p = o(nk) for some 0 < k ≤ 1/3. But, as argued by CP (2003), theserates do not allow the logarithmic rate. Hence, we state new assumptions about therate at which q, ` and p (the ADF regression order) increase:

Assumption 2.2.’

q = cqnk, ` = c`n

k p = cpnk where cq, c` and cp are constants and 1/rs < k < 1/2.

Assumptions 2.2 and 2.3 can be fitted into this assumption for appropriate valuesof k. Also, notice that assumption 2.2’ imposes a lower bound on the growth rate ofp, ` and q. This is necessary to obtain almost sure convergence. See CP (2003) for

14

a weaker assumption that allows for convergence in probability. Several preliminaryand quite technical results are necessary to prove that the bootstrap test based on thestatistic τ ?

n is consistent. To avoid rendering the present exposition more laboriousthan it needs to be, we relegate them to the appendix (lemmas A2 to A5). For now,let it be sufficient to say that they extend to the MA and ARMA sieve bootstrapsamples some results established by CP (2003) for the AR sieve bootstrap. In turn,some of CP (2003)’s lemmas are adaptations of identical results in Berk (1974) andAn, Chen and Hannan (1982).

2.4.1 Consistency of MASB ADF test

In order to prove that the MA sieve bootstrap ADF test is consistent, we now provetwo results on the elements of A?

n and B?n. These results are stated in terms of

bootstrap stochastic orders, denoted by O?p and o?

p, which are defined as follows.Consider a sequence of non-constant numbers cn. Then, we say that X?

n = o?p(cn)

a.s. or in p if Pr?|X?n/cn > ε → 0 a.s. or in p for any ε > 0. Similarly, we say that

X?n = O(cn) if for every ε > 0, there exists a constant M > 0 such that for all large n,

Pr?|X?n/cn| > M < ε a.s or in p. It follows that if E?|X?

n| → 0 a.s., then X?n = o?

p(1)a.s. and that if E?|X?

n| = O(1) a.s., then X?n = O?

p(1) a.s. See CP (2003), p. 7 for aslightly more elaborated discussion.

Lemma 2.4. Under assumptions 2.1 and 2.2’, we have

1

n

n∑

t=1

y?t−1ε

?t = πn(1)

1

n

n∑

t=1

w?t−1ε

?t + o?

p(1) a.s. (22)

1

n2

n∑

t=1

y?t−1

2 = πn(1)2 1

n2

n∑

t=1

w?t−1

2 + o?p(1) a.s. (23)

where w?t =∑t

k=1 ε?k.


Lemma 2.5. Under assumptions 2.1 and 2.2’ we have

∥

∥

∥

∥

∥

∥

(

1

n

n∑

t=1

x?p,tx

?p,t

>

)−1∥

∥

∥

∥

∥

∥

= O?p(1) a.s. (24)

∥

∥

∥

∥

∥

n∑

t=1

x?p,ty

?t−1

∥

∥

∥

∥

∥

= O?p(np

1/2) a.s. (25)

15

∥

∥

∥

∥

∥

n∑

t=1

x?p,tε

?t

∥

∥

∥

∥

∥

= O?p(n

1/2p1/2) a.s. (26)


We can place an upper bound on the absolute value of the second term of A?n.

This is:∣

∣

∣

∣

∣

∣

(

n∑

t=1

y?t−1x

?p,t

>

)(

n∑

t=1

x?p,tx

?p,t

>

)−1 ( n∑

t=1

x?p,tε

?t

)

∣

∣

∣

∣

∣

∣

≤∥

∥

∥

∥

∥

n∑

t=1

y?t−1x

?p,t

>

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

(

n∑

t=1

x?p,tx

?p,t

>

)−1∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

∥

n∑

t=1

x?p,tε

?t

∥

∥

∥

∥

∥

But by lemma 2.5, the right hand side is O?p(np

1/2)O?p(n)O?

p(n1/2p1/2). Then, using

this result and lemma 2.4, we have:

n−1A?n = πn(1)

1

n

n∑

t=1

w?t−1ε

?t + o?

p(1) a.s.

We can further say that

n−2B?n = πn(1)2 1

n2

n∑

t=1

w?t−1

2 + o?p(1) a.s.

because n−2 times the second part of B?n goes to 0 as n → ∞. Therefore, the T?

n

statistic can be seen to be:

τ ?n =

1n

∑nt=1 w

?t−1ε

?t

σ(

1n2

∑nt=1 w

?t−1

2)1/2

+ o?p(1) a.s.

recalling that w?t =∑t

k=1 ε?k, it is then easy to use the results of chapter 2 along with

the continuous mapping theorem to deduce that:

1

n

n∑

t=1

w?t−1ε

?t

d?

→∫ 1

0WtdWt a.s.

1

n2

n∑

t=1

w?t−1

2 d?

→∫ 1

0W2

tdt a.s.

under assumptions 1 and 2’. We can therefore state the following theorem.

Theorem 2.3. Under assumptions 1 and 2’, we have

τ ?n

d?

→∫ 10 WtdWt

(

∫ 10 W2

tdt)1/2

a.s.

which establishes the asymptotic validity of the MASB ADF test.

16

2.4.2 Consistency of ARMASB ADF tests

It is now very easy to prove the asymptotic validity of ADF tests based on the AR-MASB distribution. It indeed suffices to show that results similar to those presentedin the last subsection hold for the ARMASB DGP. In order to do this, we make use ofthe MA(∞) form of the ARMASB DGP (see equation 14 above) and its Beveridge-Nelson decomposition. Then, we can state the following corollaries of lemmas 2.4 and2.5.

Corollary 2.4. Under assumptions 2.1 and 2.2’, we have

1

n

n∑

t=1

y?t−1ε

?t = θn(1)

1

n

n∑

t=1

w?t−1ε

?t + o?

p(1) a.s. (27)

1

n2

n∑

t=1

y?t−1

2 = θn(1)2 1

n2

n∑

t=1

w?t−1

2 + o?p(1) a.s. (28)


Corollary 2.5. Under assumptions 2.1 and 2.2’ we have

∥

∥

∥

∥

∥

∥

(

1

n

n∑

t=1

x?p,tx

?p,t

>

)−1∥

∥

∥

∥

∥

∥

= O?p(1) a.s. (29)

∥

∥

∥

∥

∥

n∑

t=1

x?p,ty

?t−1

∥

∥

∥

∥

∥

= O?p(np

1/2) a.s. (30)

∥

∥

∥

∥

∥

n∑

t=1

x?p,tε

?t

∥

∥

∥

∥

∥

= O?p(n

1/2p1/2) a.s. (31)


Theorem 2.4 evidently follows.

Theorem 2.4. Under assumptions 2.1 and 2.2’, we have

τ ?n

d?

→∫ 10 WtdWt

(

∫ 10 W2

tdt)1/2

a.s.

This establishes the asymptotic validity of the ARMASB ADF test.

17

3 Simulations

We now present a set of simulations designed to illustrate the extent to which the pro-posed MASB and ARMASB schemes improve upon the usual ARSB. Recently, Changand Park (2003) (CP 2003 hereafter) have shown, through Monte Carlo experiments,that the ARSB allows one to reduce the ADF test’s ERP, but not to eliminate italtogether. Such results are generally interpreted as evidence of the presence of as-ymptotic refinements in the sense of Beran (1987), even though no theoretical proofexists to date. Their simulations however show that the ARSB loses some of its accu-racy as the dependence of the error process increases. This is, of course, not a surprisebecause the longer it takes for the dependence to decrease, the larger is the differencebetween the correlation structure of the estimated AR(p) sieve and that of the trueAR(∞) process, and therefore, the greater the difference between the true and thebootstrap DGPs. As we will see shortly, the same fate befalls both the MASB andARMASB.

As mentionned in the introduction, one may also use a block bootstrap to conductADF tests. The very complete simulation study of Palm, Smeekes and Urbain (2006)indicates that the ARSB performs slightly better than several versions of the blockbootstrap. Therefore, in order to make this section as consice as possible, we onlycompare the MASB and ARMASB to the ARSB.

In order to obtain consistent test statistics, one must make sure that the parametersused in the construction of the bootstrap DGP are consistent estimates under thenull. This is easily achieved in the present case by estimating the appropriate timeseries model (ie: AR, MA or ARMA) of the first difference of yt using any consistentmethod. Indeed, whenever α = 1, ( 19) simply becomes yt − yt−1 = ut, which is ournull hypothesis. Lemma 2.1 and corollary 2.1 discuss consistency of the parametersof the MA and ARMA sieves while consistency of the parameters of the AR sieve isshown by Berk (1974) and Park (2002). Bootstrap ADF tests are thus conducted asfollows:

1. Estimate the ADF test regression on the original data and compute the ADF teststatistic. In all that follows, we have used the ADF regression containing a constantand no deterministic time trend. Thus, the statistic we compute is the one commonlycalled τc. We shall denote its realisation by τc.

2. Estimate the sieve bootstrap model. This is done by fitting either an AR(p),an MA(q) or an ARMA(p,q) to ∆yt. Notice that this implicitly imposes the nullhypothesis. In order to draw bootstrap samples, we need to create the residuals

18

series. For the AR sieve, this is:

εt = ∆yt −p∑

k=1

αp,k∆yt−k

For the MA and ARMA sieves, this is:

εt = Ψ>t ∆yt

where Ψt is the tth row of the triangular GLS transformation matrix for MA(q) orARMA(p,q) processes evaluated at the parameter estimates.

3. Draw bootstrap errors (ε?t ) from the EDF of the recentered residuals (εt− 1

n

∑nt=1 εt).

As is well known, OLS residuals under-estimate the error terms so that it may be

desirable to rescale the recentered residuals by a factor of(

nn−`

)1/2, where ` is the

number of parameters.

4. Generate bootstrap samples of yt. This first requires that we generate the bootstrapfirst difference process. For the AR sieve bootstrap:

∆y?t =

p∑

k=1


t ,

for the MA sieve bootstrap:

∆y?t =

q∑

k=1

πq,kε?t−k + ε?

t ,

for the ARMA sieve bootstrap:

∆y?t =

q∑

k=1

πq,kε?t−k +

p∑

k=1


t .

This requires some sort of initialisation for the ∆y?t series. This is discussed below.

Then, we generate bootstrap samples of y?t : y

?t =

∑tj=1 ∆y?

j

5. Compute the bootstrap ADF test statistics τ ?ci based on the bootstrap ADF

regression:

y?t = α0 + αy?

t−1 +p∑

k=1

αk∆y?t−k + et

where p is the same as above.

6. Repeat steps 2 to 5 B times to obtain a set of B bootstrap ADF statistics τ ?ni,

i = 1, 2, ...B. The p-value of the test is defined as:

P ? =1

B

B∑

i=1

I(τ ?ni < τc)

19

where τc is the original ADF test statistic and I() is the indicator function which isequal to 1 every time the bootstrap statistic is smaller than τc and 0 otherwise. Thenull hypothesis is rejected at the 5 percent level whenever P ? is smaller than 0.05.Repeating this B times allows us to obtain a number of test statistics, all computedunder the null hypothesis, which can therefore be used to estimate the finite sampledistribution of the test and conduct inferences. Of course, the larger B is, the moreprecise this estimate is. Further, for a test conducted at nominal level α, B should bechosen so that (B+1)α is an integer, see Dufour and Kiviet (1998) or Davidson andMacKinnon (2004), chapter 4.

In all the simulations reported here, the AR sieve and the ADF regressions arecomputed using OLS. Further, all MA and ARMA models were estimated by the an-alytical indirect inference methods of GZW (1994 and 1997). The bootstrap samplesare generated recursively, which requires some starting values for ∆y?

t . There areseveral ways to do this. The important thing is for the bootstrap sample values notto be influenced by whatever starting values we assume. In what follows, we have setthe first p or q, whichever is larger, values of ∆y?

t equal to the first p or q values of∆yt and generated samples of n+ 100+ p (or q) observations. Then, we have thrownaway the first 100 + p (or q) values and used the last n to compute the bootstraptests. This effectively removes the effect of the initial values and insures that ∆yt isa stationary time series.

In order to obtain the residuals of the MA(q) or ARMA(p,q) sieves, we premultiplya GLS transformation matrix to the vector of observations of ∆yt, which we denoteas ∆y . This matrix, which we will call Ψ, is defined as being the n× n matrix thatsatisfies the equation ΨΨ> = Σ−1, where Σ−1 is the inverse of the covariance matrixof ∆y . There are several ways to estimate Ψ. A popular one is to compute Σ−1 andto obtain Ψ using a numerical algorithm (the Choleski decomposition for example).However, this method, which is often referred to as being exact because it is baseddirectly on the MA(q) or ARMA(p,q) form of the model, requires the inversion ofthe n × n covariance matrix, which is computationally costly when n is large. Toavoid this, one may prefer to decompose the covariance matrix of the inverse process(for example, of a AR(∞) in the case of an MA(1) model). Because it is impossibleto correctly estimate the covariance matrix of an AR(∞) model in finite samples,the resulting estimator of Ψ is said to be an approximation. For all the simulationsreported here, the transformation matrix was estimated using the method proposedby Galbraith and Zinde-Walsh (1992), who provide algebraic expressions for Ψ ratherthan for Σ. This is computationally advantageous because it does not require theinversion, the decomposition nor the storage of a n × n matrix since all calculationscan be performed through row by row looping. Since it is derived directly from themodel of interest itself (for example, from the MA(1) process rather than from itsAR(∞) form), this estimator of Ψ falls in the class of exact estimators.

20

Simulation studies of unit root tests accuracy often use an invertible MA(1) processto generate ut. The reason is that it is simple to generate and easy to control andplot the degree of correlation of ut by changing the sole MA parameter. This simpledevice is obviously inappropriate in the present context because an MA(1) wouldcorrectly model the first difference process under the null and would therefore notbe a sieve anymore. We therefore endeavoured to use DGPs that could not be mod-elled properly by any process belonging to the finite order ARMA class. One suchmodel is the fractionally integrated autoregressive moving average model of order 1(ARFIMA(1,d,1)):

(1 − L)d(1 + αL)ut = (1 + πL)εt

where L is the lag operator. An important property of this model is that it yields astationary process whenever the AR and MA parts are stationary and invertible andd < 0.5. Also, time series generated by ARFIMA processes have long memory, whichmeans that a shock at any time period has repercussions far in the future. This isof course desirable since we want to generate series with strong temporal correlation.Finally, ARFIMA models have proved to provide adequate approximation for severalmacroeconomic time series. For example, Diebold and Rudebush (1989, 1991) use theARFIMA framework to study United States output and consumption, Porter-Hudak(1990) applies it to money supply while Shea (1991) studies interest rates behaviour.In fact, Granger (1980) has shown that such processes arise naturally when a dataseries is generated by aggregating several serially correlated processes. Since mostmacroeconomic time series are compiled in such a fashion, it is likely that ARFIMAprocesses frequently occur in reality. We also use long autoregressions and movingaverages with decaying parameters. These models are:

AR1 : ut = θ[0.99L− 0.96L2 + 0.93L3 − ... + 0.03L33]ut + εt

MA1 : ut = θ[0.99L− 0.96L2 + 0.93L3 − ...+ 0.03L33]εt + εt

Strictly speaking, these could be modelled adequately by an AR(33) and an MA(33)respectively. We will consequently restrict ourselves to using approximating ordersfar from these values so that we can obtain results similar to what we would get ifthese were indeed infinite order processes. Unreported simulations based on AR(99)and MA(99) DGPs yielded similar results. It is worth noting that the pattern ofparameter decay is very close to what we would get if we were to invert an MA(1) oran AR(1) for the models AR1 and MA1 respectively. As a matter of fact, the rates ofdecay in AR1, and MA1 are most of the time slower than what we would have withinverted MA(1) and AR(1) models. For example, for high enough values of θ, thedecay rate in AR1 and MA1 is slower than the decay rate of the parameters from aninverted MA(1) or AR(1) model with parameter 0.99. On the other hand, for a lowenough value of θ, the reverse is true. It follows that we can control for the degree ofcorrelation by changing the value of θ.

21

In an attempt to further test the robustness of our sieve bootstraps, we have alsoconsidered DGPs for which they are not asymptotically valid. Examples of such DGPare threshold regime switching models, which are non-linear processes. We have usedtwo regime switching models, both of which had their threshold point at 0. The firstone was,

ut =

αut−1 + εt, if ut−1 ≤ 0αεt−1 + εt, if ut−1 > 0

where we considered several values of α. The second model was:

ut =

θ1εt−1 + εt, if ut−1 ≤ 0θ2εt−1 + εt, if ut−1 > 0

where θ2 = 0.8 and θ1 was allowed to vary. We shall call the first DGP T1 and thesecond T2. We have also used the GARCH(1,1) model:

ut = εth1/2t

ht = 1 + α(ht−1 + u2t−1)

Our sieve bootstraps clearly are not designed for such models because they do notattempt to preserve the conditional heteroskedasticity structure of the data. We shallrefer to this DGP as GARCH.

We consider simulations for samples of 200 observations, which correspond to 50years of quarterly data or close to 17 years of monthly data, a situation that islikely to occur in real life applied macroeconomic work. Thus, the results we presentbelow are of special interest for applied researchers. Unless otherwise stated, all thesimulations presented in this section are based on 2000 replications of samples of200 observations and all bootstrap tests are computed using 499 bootstrap samples.In all simulations, the innovations were homoskedastic and drawn from a normaldistribution with variance 1. Since ADF tests are scale invariant, this last assumptiondoes not affect our results.

3.1 Rejection probability under the null

Figure 3.1 shows the ERP of the ADF test as a function of the MA parameter in theARFIMA(0,d,1) model based on different parameter values. We have set d = 0.45and the MA parameter takes the values -0.95, -0.9, -0.8, -0.6, -0.2, 0.2, 0.6, 0.8, 0.9and 0.95. This yields a process with long memory with possible near-cancellation ofthe autoregressive and MA roots. The test labelled asymp is based on the criticalvalue of the DF distribution at nominal level 5%. Similar results were obtained for the10% level. The tests labelled AR, MA and ARMA are based on null distributionsgenerated using AR, MA and ARMA sieve bootstrap samples with the specified

22

orders. Notice that the number of lags in the ADF regression is the same as theorder of the AR sieve bootstrap DGP, which is not necessarily what would happenif we were using a data dependent method to choose these orders. In particular, theARSB ADF regression always has the same number of lags than the ARSB DGP. Itindeed does not make much sense to use a lag selection method in this case becausewe know that the DGP of the first difference is an AR(p). The impact of this choiceis explored in Richard (2007), where we find that different selection strategies maygreatly affect for the test’s ERP. We have chosen to fix all the orders of our sievemodels equal to 10 in order to identify which bootstrap test makes the best use ofthis given parametric structure.

Figure 3.1. ERP at nominal level 5%, ARFIMA(0,d,1), N=200.

The most striking feature of this figure is the apparent incapacity of the AR sievebootstrap test to improve significantly upon the asymptotic one. On the other hand,both MA(10) and ARMA(10,10) significantly reduce ERP, especially so for the mostcorrelated processes (-0.95, -0.9 and -0.8). The question is therefore to discover whatit is that the last two do that AR sieve does not do. We will discuss this pointfurther below. Also, it is interesting to notice that using an ARMA(10,10) insteadof an MA(10) does not seem to improve the quality of the inference, except for anMA parameter of -0.8 or -0.6, where the ARMA(10,10) corrects the MA(10)’s slighttendency to under reject. This issue is also explored later.

These results obviously depend on the choice of approximating orders as well ason the choice of the DGP. In particular, increasing the order of the ARSB wouldcertainly contribute to decreasing its over-rejection problems. We have considered

23

this issue by comparing the curves shown in figure 3.1 to similar curves generatedusing approximating orders of 15.


Figures 3.2 shows the effect of increasing the order of the different sieves on theERP of the 5% level test. As would be expected, such increases have a beneficialeffect on the ERP of all tests. The most noteworthy amelioration is the decrease ofthe AR sieve’s ERP. Note that the ERP functions do not cross, that is, increasingall approximating orders by the same increment does not change the order in whicheach sieve test performs.

Next, we considered an ARFIMA(1,d,0) model with the AR parameter takingvalues -0.95, -0.9, -0.8, -0.6, -0.2, 0.2, 0.6, 0.8, 0.9 and 0.95. Figure 3.3 shows theERP of the various ADF tests conducted on these data sets at nominal level 5%.A quick comparison with figure 3.1 reveals a completely different situation. Indeed,the severe over-rejection displayed by the ARSB and asymptotic tests in these formerfigures is now all but gone. This is to be expected since both tests explicitly model theAR root of the DGP. The MA sieve bootstrap still performs quite well. Similarly towhat happened with the preceding DGP, it tends to over-reject somewhat for valuesof the sole parameter (here, the AR parameter) close to -1. Nevertheless, comparingfigure 3.1 to figure 3.3 reveals that this problem is smaller with the new DGP. Forexample, the MA(10) sieve bootstrap test over-rejects by 0.18 when the true DGPis the ARFIMA(0,d,1) with an MA parameter of -0.95 and only by 0.034 when theDGP is the ARFIMA(1,d,0) with an AR parameter of -0.95. Also, it should be notedthat the MA(10) sieve bootstrap test now over-rejects slightly for large positive AR

24

parameters (over-rejection of 0.031 at 5% and 0.055 at 10%). Again, this is nottoo surprising because these parameters yield very persistent MA(∞) models wheninverted, which are hard to model properly using a finite order MA(q). Nevertheless,the MA sieve bootstrap still performs very adequately and thus appears to be quiterobust to the form of the underlying process, which is more than can be said of eitherthe asymptotic or the AR sieve bootstrap tests.


The ARMA sieve also experiences some difficulties around the borders of the para-meter space. Figure 3.3 clearly shows that it over-rejects quite severely compared tothe other 3 tests when |α| ≥0.90. Nevertheless, it can be seen that the test’s ERP isactually smaller here for a MA or AR parameter of -0.95 (0.22 in the ARFIMA(0,d,1)case and 0.16 in the ARFIMA(1,d,0) case at 5%). On the other hand, the ERP at 5%goes from 0.009 in the first case with an MA parameter of 0.95 to 0.18 in the secondcase with an AR parameter of 0.95. Thus, the ARMA sieve bootstrap test appearsto be less robust than the MA sieve bootstrap test. Of course, these results dependon the chosen AR and MA orders. We will see later that the ARMASB is greatlyinfluenced by this choice.

Aside from the behaviour of the ARMASB, there is a quite simple explanation tothe results of the preceding 3 figures, which is that the ARFIMA(0,d,1) model usedto generate figures 3.1 and 3.2 includes DGPs with near cancelling roots when theMA parameter takes values near -1 while the ARFIMA(1,d,0) does not include suchDGPs. Indeed, we may write the model having generated yt as follows:

(1 − L)dΘ(L)(1 − L)yt = Φ(L)εt

25

where Θ(L) and Φ(L) are lag polynomials. In the case of the ARFIMA(0,d,1),Θ(L)=1 and Φ(L)=(1+θL), where θ is the sole MA parameter. Dividing both sidesby Φ(L), it is easy to see that, when θ is close to -1, Φ(L) and (1-L) almost canceleach other out. What results is therefore what looks like a stationary ARFIMA(0,d,0)process and it may be difficult to distinguish between this false process and the trueone. It is, however, necessary to be able to make the distinction between the truenon-stationary process and the false stationary process for the purpose of unit roottesting. Indeed, if our testing procedure is fooled into believing that the series isdriven by the false stationary DGP, it will naturally tend to reject the unit root hy-pothesis. This is what happens to the asymptotic test and the AR sieve bootstraptests. The MA and ARMA sieve bootstrap tests also mistake yt for a stationaryprocess, as is evident from their higher ERP when θ approaches -1. However, the factthat they over-reject much less severely indicates that they make this mistake lessoften than the other two. This is also to be expected because they both explicitlymodel an MA part, so that they are more likely to detect the presence of the strongMA root. On the other hand, when Φ(L)=1 and Θ(L)=(1+αL), where α is the soleAR parameter, no such near cancellation occurs, even when α is close to -1.

It is also worth noting that the ARSB test has almost always the same ERP asthe asymptotic test and that it brings only small accuracy gains, when it brings any.This implies that the AR sieve bootstrap ADF test distribution is very close to theDF distribution and to the right of the actual test’s distribution. Therefore, the 5%critical value of the ARSB test is close to the DF critical value and lower in absolutevalue than the actual distribution’s critical value, and the test over-rejects. As figure3.4 indicates, solving this problem may require the inclusion of a very large numberof lags in the AR sieve model. In Richard (2007), we use several simulations to showthat this is due to the fact that the ARSB does not properly replicate the correlationstructure present in the original data’s ADF regression residuals. We propose somesolutions to this problem, including a modified version of the fast double bootstrapof Davidson and MacKinnon (2006). The MASB may also over-reject, but as figure3.4 illustrates, this problem disappears rather quickly as q increases. Similar resultswere obtained for different sample sizes.

26

Figure 3.4. ERP of ARSB and MASB at nominal level 5%, ARFIMA(0,d,1) model,

θ=-0.9, N=200.

Figure 3.5 shows the ERP at a nominal level of 5% of the asymptotic, AR, MAand ARMA sieve bootstrap tests with the DGP AR1 for values of θ ranging from-0.99 to 0. As expected, the asymptotic and ARSB tests severely over-reject the null.The shape of their ERP functions is however quite different from what it was in theARFIMA(0,d,1) DGP. This is due to the fact that the correlation structure decreasesmuch more slowly as a function of θ than it did as a function of the MA parameter inthe latter example. It may also be due in part to the near-cancellation of the unit rootwhich may occur with the MA lag polynomial that would be obtained by invertingthe model AR1. On the other hand, both the MA and ARMA sieves considered have,in comparison, quite acceptable ERPs. Further, the proportion by which they over-reject is relatively constant through the parameter space, which confirms that thepersistence of the correlation is as much a determining factor as its strength. Theseare yet more arguments in favour of the MASB and ARMASB on the ground thatthey appear to be much more robust to the underlying DGP than the ARSB or theasymptotic test.

27

Figure 3.5. ERP at nominal level 5%, AR1 model, N=200.

Figure 3.6. ERP at nominal level 5%, MA1 model, N=200.

Figure 3.6 presents similar results for the MA1 model. We first note that the ARand MA sieve bootstrap tests have similar characteristics for almost all values of θexcept when this parameter is close to 1, in which case the MASB experiences somedifficulties. This is however not surprising because the process MA1 in such cases isvery similar to an inverted AR(1) with a parameter close to 1. It is a well known

28

fact that such processes are difficult to estimate without bias because they lie in aregion close to non-stationarity. Thus, the MA sieve may have problems in tryingto accurately estimate the DGP in this region. This can also explain why the MAsieve over-rejects more severely for θs close to 1 than for θs close to -1 because preciseestimation of the latter usually proves to be more difficult than precise estimation ofthe former. See figure 1 of MacKinnon and Smith (1998) for a convincing illustrationof this fact. The ARMASB, which we will now study with more care, experiencesdifficulties all over the parameter space.

The ERP functions shown in figure 3.7 belong to the ARMASB with 10 MAparts and a variable number of AR parts. They were generated using the AR1 DGPdescribed above where the parameter θ was set to -0.9. We have used 3000 MonteCarlo replications of samples of 200 observations and 499 bootstrap samples wereused for each replication. This figure is very odd indeed, for it indicates that addingan AR part to the MA sieve, i.e., going from an MA(10) to an ARMA(1,10), increasesslightly the test’s ERP at 5% and decreases it slightly at 10%. As a matter of fact,the 5% nominal level test ERP increases with p until p = 7, at which point it startsdeclining again. The addition of the eleventh and twelfth AR parts is quite beneficialbecause the ERP at both nominal levels then drops to much lower levels than thoseof the MA(10) sieve. Thus, these figures indicate that the difference between p and qis a determining factor for the accuracy of the ARMA(p,q) sieve.

Figure 3.7. ERP at nominal level 5% and 10%, AR1 model, N=200.

This result can be readily explained by the fact that the roots of the averageestimated AR and MA polynomials often cancel each other out. To illustrate this, we

29

consider three examples, namely the ARMA(1,10), ARMA(7,10) and ARMA(12,10).These roots are shown in table 3.1 where, as usual, i denotes the imaginary number√−1. All the roots in this section were computed using the numerical method of

Weierstrass (1903).

Table 3.1. Roots of AR and MA parts.

Model Roots (AR) Modulus Roots (MA) ModulusARMA(1.10) -9.31 9.31 1.17 1.170 0 0 1.47 1.470 0 0 -0.8+-1.68i 1.860 0 0 0.29+-1.73i 1.750 0 0 -1.62+-1.04i 1.920 0 0 1.21+-1.17i 1.68ARMA(7.10) -1.38+-0.79i 1.59 1.16+-1.09i 1.590 -1.42+-1.17i 1.83 0.166+-1.71i 1.710 0.12+-1.855i 1.85 -1.03+-1.55i 1.860 -1.59 1.59 -1.88+-0.61i 1.970 0 0 1.26+-0.09i 1.26ARMA(12.10) 0.81+-0.99i 1.27 0.17+-1.67i 1.670 1.12+-0.4i 1.18 -1.62+-1.59i 2.260 0.21+-1.32i 1.33 1.2+-0.2i 1.210 -1.01+-0.86i 1.32 1.07+-1.14i 1.560 -1.22+-0.29i 1.25 -0.91+-1.47i 1.720 -0.47+-1.27i 1.35 0 0

The AR part in the ARMA(1,10) sieve DGP does not come close to cancelling outany of the MA roots. It is therefore logical that the ARMASB tests based on thisspecification would have ERP comparable to those based on the MA(10). However,table 1 reveals a completely different story for the ARMA(7,10) sieve. Indeed, fourcomplex roots (and their conjugate, for a true total of 8) tend to cancel out, amongwhich three do so almost exactly. For instance, the MA root 1.16+-1.09i has modulus1.591 while the AR root -1.38+-0.79i has modulus 1.590. Hence, the net correlationmodelled by the ARMA(7,10) must be much smaller than what such a high parametricorder would suggest. Accordingly, our simulations indicate that ARSB tests based onthis model reject over 20% of the times at nominal level 5%. Finally, there appearsto be fewer such cancellations occurring in the ARMA(12,10). This would explainwhy the tests based on this sieve model have lower ERP.

It is also interesting to investigate whether increasing the ARMASB approximatingorder substantially decreases the test’s ERP. To do so, we have generated 5 000samples of the ARFIMA(1,d,0) and ARFIMA(0,d,1) DGPs used above with α=-0.9and θ=-0.9. For each sample we have estimated several ARMA(`,`) models, with

30

`=1,2,..10 and conducted sieve bootstrap ADF tests based on each. Figure 3.8 showsthe ERP of these tests at nominal level 5%.

Figure 3.8. ERP at nominal level 5%, N=200.

The figure shows that the test’s ERP depends on the sieve order only to a certaindegree. In particular, it appears to be almost exactly the same for all the approximat-ing orders considered here when the DGP is the ARFIMA(0,d,1). The same thingcan be observed with the ARFIMA(1,d,0) DGP and ` ≥ 4. To explain this puzzlingfeature, we have also computed the average values of the estimated parameters of theARMA(`,`) models, as well as their standard deviation. Tables 3.2 and 3.3 show theratio of the average value of each parameter of the MA and AR parts respectively ofthe ARMA sieves to its standard deviation. These tables are based on simulationscarried out using the ARFIMA(0,d,1) DGP with θ=-0.9.

It is clear from these numbers that most of the ARMA sieves used to build figure 3.8are grossly over-specified, in the sense that higher order parameters do not contributemuch to the simulated data’s correlation structure. This explains why increasing `does not significantly affect the test’s ERP. Thus, the ARMA sieve has one importantadvantage over ARSB and MASB, which is that it is much more parsimonious in thesense that it requires the estimation of many fewer parameters to achieve similarperformances.

31

Table 3.2. T ratio of the parameters of the MA part.

1 2 3 4 5 6 7 8 9 10θ1 -22.75 -6.91 -5.56 -5.39 -5.22 -5.02 -4.74 -4.49 -4.25 -4.00θ2 na 1.25 1.28 1.18 1.03 0.93 0.76 0.66 0.55 0.42θ3 na na -0.61 -0.41 -0.18 -0.08 0.07 0.05 0.13 0.08θ4 na na na 0.19 0.04 0.09 0.03 0.10 0.05 0.12θ5 na na na na 0.05 -0.09 0.03 -0.06 0.06 0.00θ6 na na na na na 0.12 -0.01 0.10 0.02 0.05θ7 na na na na na na 0.09 -0.03 0.04 0.00θ8 na na na na na na na 0.10 0.00 0.06θ9 na na na na na na na na 0.10 -0.01θ10 na na na na na na na na na 0.14

Table 3.3. T ratio of the parameters of the AR part

1 2 3 4 5 6 7 8 9 10α1 -5.50 -1.65 -0.96 -1.07 -1.24 -1.36 -1.46 -1.54 -1.60 -1.70α2 na 1.25 0.97 1.01 1.04 0.94 0.83 0.66 0.55 0.40α3 na na -0.47 -0.18 -0.18 -0.10 -0.09 -0.06 -0.07 -0.07α4 na na na -0.13 -0.15 -0.20 -0.21 -0.22 -0.20 -0.24α5 na na na na -0.18 -0.15 -0.20 -0.18 -0.23 -0.24α6 na na na na na -0.24 -0.20 -0.24 -0.25 -0.26α7 na na na na na na -0.24 -0.22 -0.24 -0.23α8 na na na na na na na -0.28 -0.26 -0.27α9 na na na na na na na na -0.29 -0.23α10 na na na na na na na na na -0.34

Also, a comparison of the roots of the average AR and MA polynomials of theARMA sieve reveals a clear tendency for larger roots to cancel out. Table 3.4shows the roots and modulus of two ARMA(p,q) models when the true DGP is theARFIMA(0,d,1) with θ=-0.9.

TABLE 3.4 Roots of AR and MA polynominal of ARMA sieve model.

Root 1 Mod 1 Root 2 Mod 2 Root 3 Mod 3(2.2) AR -2.00 2.00 4.19 4.19 n.a. n.a.

MA 1.16 1.16 4.51 4.51 n.a. n.a.(3.3) AR -1.86 -1.86 2.27+-2.7i 3.53 2.27-+2.7i 3.53

MA 1.13 1.13 2.34+-3.26i 4.01 2.34-+3.26 4.01

32

It can be seen that, in all the cases considered in table 3.4, the larger roots inmodulus tend to be very similar. For example, in the ARMA(2,2) case, we havepositive roots of 4.19 and 4.51 for the AR and MA polynomials respectively. Similarly,in the ARMA(3,3) case, we have two positive complex roots with modulus 4.01 and3.52. These evidently almost cancel each other out. What results is a process that hasa correlation structure quite similar to that of the ARMA(1,1), which has a uniqueroot of 1.14, which is almost identical to the roots of 1.16 and 1.13 reported in table3.4. Similar patterns were observed for all ARMA(`,`) models for `=1,...,10.

Since most higher order parameters appear to be insignificant, and because wehave reasons to believe that high orders for the AR and MA parts may result inthe two partially cancelling out, the next logical step is to study small order modelsmore closely. Figure 3.9 shows the size function of 3 ARMA(p,q) sieve bootstraptests conducted using 2 500 Monte Carlo samples of the ARFIMA(0,d,1) DGP withα=-0.9, a case where the ARMA(10,10) sieve experiences some difficulties and 499bootstrap replications each.

Figure 3.9. Size function, ARFIMA(1,d,0), α=-0.9, N=200.

33

Figure 3.10. Size function, ARFIMA(1,d,0), α=-0.9, N=200.

The most parsimonious model among those considered, namely the ARMA(1,2),has the lowest ERP. In fact, its ERP at nominal level 10% is a mere 0.035, comparedto 0.063 for the MA(10) sieve. Figure 3.10 above compares the RP of these two testsfor nominal levels from 0.01 to 0.30. While the two procedures yield similar RPsat lower nominal levels, it is obvious that the ARMASB is more accurate at higherlevels.

It therefore appears that the ARMASB’s accuracy is greatly influenced by itsorder. We therefore now investigate the performances of the sieve bootstrap testswhen data dependent model selection methods are used. In related work (Richard,2007), we find that the manner in which the orders of the sieve models and of theoriginal and bootstrap ADF regressions are chosen greatly influences the finite sampleproperties of the ARSB. In particular, even though, as we argued earlier, this mayseem inefficient, we find that the ARSB ADF test has lower ERP when each of theseorders are chosen independently. An explaination for this is proposed in Richard(2007). The simulations below were carried out using that strategy.

Our simulations are based on the ARFIMA DGPs with d=0.45 and different valuesof α and θ. We have used 4000 Monte Carlo samples with 499 bootstrap samples foreach replications. The maximum orders were 15 for the ARSB and MASB and 5for both the AR and MA parts of the ARMASB. We did not impose any minimumorder. Doing so may change the results reported below. Because it is consistent in thepresence of infinitely correlated processes, we have used the AIC. The ERP functionsfor the 3 DGPs at nominal sizes between 1% and 30% are shown in figures 3.11 to

34

3.14 respectively. We have run two sets of experiments. In the first, we compare RPsof tests based on AR(p), MA(q) and ARMA(p,q) sieve models with p and q chosenby AIC. In the second set, we simply let the AIC choose whatever ARMA(p,q) modelit deems better fitting. Thus, the ARSB and MASB are nested into the ARMASB.Figures 3.11 to 3.13 report the results of the first set while figure 3.14 reports thoseof the second.

Figure 3.11. ERP function, ARFIMA(0,d,1), θ=-0.9, N=200.

Figure 3.11 shows the ERP functions of the three tests when the first difference isgenerated by the ARFIMA(0,d,1). In accordance with the results presented above,the presence of the large negative MA parameter causes the ARSB to strongly over-reject for all nominal sizes considered here. This is an illustration of the fact pointedout by Ng and Perron (1995) that AIC tends to under-specify AR models when alarge negative MA parameter is present in the DGP. Here, the average AR sieve orderturned out to be 7.78, with a standard deviation of 2.22. It also comes as no surprisethat the MASB has, in comparison, very small ERP. This follows from the fact thatit is able to directly model the MA root. It should also be noted that it achieves thiswhile using less information since AIC selected an average order of 4.8 with a standarddeviation of 2.87. The behaviour of the ARMA sieve’s ERP is most disappointing.It indeed over-rejects significantly more than the MASB for all nominal sizes. Oneexplanation for this is that, according to our previous simulations, there may be onlyone specification that yield low ERP. AIC selected an average p of 2.4 and an averageq of 1.6 with standard deviations of 1.54 and 1.36 respectively. These estimates maynot correspond to the orders that minimize ERP. It would therefore appear that AICis not the best criterion to select the ARMASB specification.

35

Figure 3.12. ERP function, ARFIMA(1,d,0), α=-0.9, N=200.

Figure 3.12 shows the ERP function of the three tests when the first differenceis generated by an ARFIMA(1,d,0). Again, as expected, the presence of the largenegative AR parameter is accompanied by a very adequate performance of the ARSB,which has virtually no ERP. The MASB also performs quite well, though it under-rejects at all nominal sizes considered, which hints at a lack of power. On average,the AIC has selected a lag order of 2.48 for the ARSB, which confirms that the largeAR root is the important thing to capture, while the MASB had an average orderof 10.32. Hence, it is most probable that the MASB’s under-rejection results fromexcessive over-specification of the first difference model. Once again, the ARMASBseverely over-rejects for all considered nominal sizes. The average orders chosen bythe AIC are here 2.03 for the AR part and 1.82 for the MA part.

Figure 3.13 shows the ERP function of the three tests when the first differenceprocess is generated by an ARFIMA(1,d,1). The features of this figure are almostidentical to those of figure 3.12. Indeed, the ARMASB still over-rejects, probablyfor the reasons given above. The MASB, which has an average order of 5.02, under-rejects once more, though less severely than in the preceding case. The only notabledifference is the ARSB’s over-rejection over the whole nominal level range considered.For low nominal levels, it over-rejects by a proportion similar to that by which theMASB under-rejects (for example, RPs of 0.144 and 0.0588 at 10% for the ARSBand MASB respectively). The most likely reason is that the ARSB is slightly under-specified (average order of 3.27) while the MASB is slightly over-specified (averageorder 5.02). It is worth noting that the ARSB’s over-rejection increases with thenominal level while the MASB’s under-rejection remains roughly constant.

36

Figure 3.13. ERP function, ARFIMA(1,d,1), α=θ=-0.4, N=200.

While it is true that, in practice, one might want to use several sieve models togenerate bootstrap samples, it seems logical to give precedence to the results of theone model which best satisfies some selection criterion within a class of models. Inthe three figures above, we have considered three classes of models separetely, namely,the AR(p), MA(q) and ARMA(p,q) family. Because these three classes of models areparametric approximations, it is natural to wonder what would happen if we wereto select one model among them all. Thus, we have run simulations where the sievebootstrap tests are conducted using the best fitting ARMA(p,q) model with p ≥ 0and q ≥ 0 as selected by AIC. Evidently, the ARSB and MASB are thus nested in theARMASB. We have only used one DGP, namely the ARFIMA(0,d,1) with θ=-0.9.The results are labelled ARMA(general) in figure 3.14 and are compared to the resultsreported in figure 3.11. It is obvious from the figure that allowing for more flexibilityin the model selection process does not ameliorate the ARMASB’s ERP problem atall.

37

Figure 3.14. ERP function, ARFIMA(0,d,1), θ=-0.9, N=200.

Since the AIC sometimes over-specifes the MA sieve, under-specify the AR sieveand that it clearly does not choose the right ARMA order to minimise ERP, we believethat it is not a very good selection criterion for our present purposes. This explainsat least in part the disappointing results observed in figure 3.13. Because of theirgreat similarity, it is doubtful that the BIC would provide more precise inference.Since it has a more severe penalty function, it should be expected to choose loweraverage orders. This may or may not be beneficial for the MASB and ARMASB, butcertainly not for the ARSB. The question of lag selection is examined more carefulyin Richard (2007).

3.2 Other models

The sieve bootstrap is only valid when the data has been generated by a linear process.Since the true DGP is not known and may very well be nonlinear, this is a seriouslimitation. It is, therefore, interesting to check how our sieve bootstraps fare in thepresence of data generated by such models. Figures 3.15 to 3.17 show ERP functionsfor the DGPs T1, T2 and GARCH. They are all based on 5000 Monte Carlo sampleswith 499 bootstrap repetitions each.

38

Figure 3.15. ERP function, T1, N=200.

Figure 3.16. ERP function, T2, N=200.

39

Figure 3.17. ERP function, GARCH, N=50.

It is hard to draw any clear conclusions from figure 3.15, other than the fact thateach methods perform more or less well depending on the DGP. Figure 3.16 shows thata threshold MA model may be a situation where the ARMASB is extremely accurate.Also, the very close correspondence between the asymptotic and ARSB tests appearsonce more. Of course, these results may be specific to the DGPs considered. Wedid, however, experiment with different lag orders for the sieve bootstraps and foundthat the MASB and ARMASB are much more robust to this choice than the ARSB.Finally, figure 3.17 indicates that all three sieve bootstraps significantly improve uponthe asymptotic test for most of the parameter space considered. We therefore concludethat, even though they are not valid, the sieve bootstraps considered here may stillbe more accurate than the asymptotic test.

3.3 Power

Figure 3.18 compares size-power curves of different bootstrap ADF tests. Thesecurves, which were proposed by Davidson and MacKinnon (1998), are built by esti-

mating the EDF of the p-value of the test under the null (call this EDF Fn) and under

a specific alternative (call this EDF Fa) and by plotting the pair (Fn, Fa) in the unitsquare. This allows to plot power against true size, which makes power comparisonbetween two tests straightforward. Indeed, whenever the curve of one test is abovethat of another for a given point on the horizontal axis, say z, one can say that theformer test is more powerful than the latter when the true size of the test is z. Size-

40

power curves are especially useful when one wants to compare two tests with differentERP under the null because they provide an easy way to obtain size-adjusted powerfor all sizes at once.

Figure 3.18. Size-power curves, ARFIMA(0,d,1), θ=-0.9, N=200.

Figure 3.18 was generated using 2 500 Monte Carlo samples with 999 bootstraprepetitions in each. The alternative hypothesis was yt = 0.9yt−1 + ut and the samerandom numbers were used to create the null and alternative data in order to minimizeexperimental error. It shows that the ADF test based on the ARSB has more powerthen any of the other. Recall, however, that it over-rejects more than the MASB andARMASB, see figure 3.2.

The results from a different DGP for which none of the tests over-reject severelyare presented in figure 3.19. It shows that all sieve bootstrap tests have comparablepower, although the MASB, again, appears to be slightly less powerful.

41

Figure 3.19. Size-power curves, ARFIMA(0,d,1), θ=0.8, N=200.

4 Conclusion

The sieve bootstrap is a method that uses finite order models to approximate infinitelycorrelated data and perform bootstrap inference. Since its formal introduction byBuhlmann (1997), only autoregressive approximations have been used. We proposeto use MA and ARMA approximations. We provide invariance principles that maybe used to derive asymptotic properties for the tests performed using them. As anapplication, we show that ADF unit root tests based on these sieve bootstraps areasymptotically valid. Similar results were provided for the usual ARSB by Park(2002) and Chang and Park (2003).

We provide Monte Carlo evidence to the effect that our MASB and ARMASB maybe much more accurate than the ARSB or asymptotic tests in certain circumstances.The accuracy of the different procedures depends on the chosen specification, andour simulations have shown that this may be a serious problem, especially in the caseof the ARMASB. Since we consider a rather wide variaty of DGPs, even some forwhich our theory is not valid, we may claim that these results are robust and may bereferred to with a certain degree of confidence.

42

A Appendix: Mathematical Proofs

Proof of Lemma 2.1.

First, consider the following expression from Park (2002, equation 19):

ut = αf,1ut−1 + αf,2ut−2 + ...+ αf,fut−f + ef,t (32)

where the coefficients αf,1 are pseudo-true values defined so that the equality holdsand the ef,t are uncorrelated with the ut−k, k = 0, 1, 2, ...r. Using once again theresults of GZW (1994), we define

πq,1 = αf,1

πq,2 = αf,2 + αf,1πq,1

...

πq,q = αf,q + αf,q−1πq,1 + ...+ αf,1πq,q−1

to be the moving average parameters deduced from the pseudo-true parameters ofthe AR model ( 32). It is shown in Hannan and Kavalieris (1986) that

max1≤k≤q

|αf,k − αf,k| = O(

(log n/n)1/2)

a.s.

where αf,i are OLS or Yule-Walker estimates (an equivalent result is shown to holdin probability in Baxter (1962)). Further, they show that

f∑

k=1

|αf,k − αk| ≤ c∞∑

k=f+1

|αk| = o(f−s).

where c is a constant. This yields part 1 of lemma 3.1 of Park (2002):

max1≤k≤f

|αf,k − αk| = O(

(log n/n)1/2)

+ o(f−s) a.s.

Now, it follows from the equations of GZW (1994) that:

|πq,k − πk| =

∣

∣

∣

∣

∣

∣

k−1∑

j=0

(αf,k−jπq,j − αk−jπj)

∣

∣

∣

∣

∣

∣

(33)

Of course, it is possible to express all the πq,k and πk as functions of the αf,k and αk

respectively. For example, we have π1 = α1, π2 = α21 + α2, π3 = α3

1 + 2α1α2 + α3 andso forth. It is therefore possible to rewrite ( 33) for any k as a function of αf,j andαj , j = 1, ...k. To clarify this, let us consider, as an illustration, the case of k = 3:

|πq,3 − π3| =∣

∣

∣α3f,1 + αf,1αf,2 + αf,1αf,2 + αf,3 − α3

1 − α1α2 − α1α2 − α3

∣

∣

∣

43

using the triangle inequality:

|πq,3 − π3| ≤ |α3f,1 − α3

1| + |αf,1αf,2 − α1α2| + |αf,1αf,2 − α1α2| + |αf,3 − α3|

Thus, using the results of lemma 3.1 of Park (2002), |πq,3 − π3| can be at most

O[

(log n/n)1/2]

+ o(f−s). Similarly, |πq,4 − π4| can be at most O[

(log n/n)1/2]

+

o(f−s) and so forth. Generalizing this yields the stated result. The other two resultsfollow in a similar manner.

Proof of Lemma 2.2.

First, note that n1−r/2E? |ε?t |r = n1−r/2

(

1n

∑nt=1

∣

∣

∣εq,t − 1n

∑nt=1 εq,t

∣

∣

∣

r)

because the boot-

strap errors are drawn from the series of recentered residuals from the MA(q) model.Therefore, what must be shown is that

n1−r/2

(

1

n

n∑

t=1

∣

∣

∣

∣

∣

εq,t −1

n

n∑

t=1

εq,t

∣

∣

∣

∣

∣

r)

a.s.→ 0

as n → ∞. If we add and subtract εt and εq,t (which was defined in equation 6)inside the absolute value operator, we obtain:

n1−r/2

(

1

n

n∑

t=1

∣

∣

∣

∣

∣

εq,t −1

n

n∑

t=1

εq,t

∣

∣

∣

∣

∣

r)

≤ c (An +Bn + Cn +Dn)

where c is a constant and

An = 1n

∑nt=1 |εt|r

Bn = 1n

∑nt=1 |εq,t − εt|r

Cn = 1n

∑nt=1 |εq,t − εq,t|r

Dn =∣

∣

∣

1n

∑nt=1 εq,t

∣

∣

∣

r

To get the desired result, one must show that n1−r/2 times An, Bn, Cn and Dn eachgo to 0 almost surely.

1. n1−r/2Ana.s.→ 0.

This holds by the strong law of large numbers which states that Ana.s.→ E |εt|r which

has been assumed to be finite. Since r > 4, 1 − r/2 < −1 from which the resultfollows.

2. n1−r/2Bna.s.→ 0.

44

This is proved by showing that

E |εq,t − εt|r = o(q−rs) (34)

holds uniformly in t and where s is as specified in assumption 2.1 part b. We beginby recalling that from equation ( 6) we have

εq,t = ut −q∑

k=1

πkεq,t−k.

Writing this using an infinite AR form:

εq,t = ut −∞∑

k=1

αkut−k

where the parameters αk are functions of the first q true parameters πk in the usualmanner (see proof of lemma 2.1). We also have:

εt = ut −∞∑

k=1

πkεt−k.

which we also write in AR(∞) form:

εt = ut −∞∑

k=1

αkut−k.

Evidently, αk = αk for all k = 1, ..., q. Subtracting the second of these two expressionsto the first we obtain:

εq,t − εt =∞∑

k=q+1

(αk − αk)ut−k (35)

Using Minkowski’s inequality, the triangle inequality and the stationarity of ut,

E |εq,t − εt|r ≤ E |ut|r

∞∑

k=q+1

|(αk − αk)|

r

(36)

The second element of the right hand side can be rewritten as

∞∑

k=q+1

∣

∣

∣

∣

∣

q∑

`=1

π`αk−` −k∑

`=1

π`αk−` − πk

∣

∣

∣

∣

∣

r

=

∞∑

k=q+1

∣

∣

∣

∣

∣

∣

−k−q∑

`=1

π`αk−` − πk

∣

∣

∣

∣

∣

∣

r

that is, we have a sequence of the sort: -(πq+1+πq+2+π1αq+1+πq+3+π1αq+2+π2αq+1...)Then, it follows from assumptions 2.1 b and 2.2 that

E |εq,t − εt|r = o(q−rs) (37)

45

because E|ut|r <∞ (see Park 2002, equation 25, p. 483). The equality ( 37) togetherwith the inequality ( 36) imply that equation ( 34) holds. In turn, equation ( 34)implies that n1−r/2Bn

a.s.→ 0 is true, provided that q increases at a proper rate, suchas the one specified in assumption 2.2.

3. n1−r/2Cna.s.→ 0

We start from the AR(∞) expression for the residuals to be resampled:

εq,t = ut −∞∑

k=1

αq,kut−k (38)

where αq,k denotes the parameters corresponding to the estimated MA(q) parametersπq,k. Then, adding and subtracting

∑∞k=1 αq,kut−k, where the parameters πq,k were

defined in the proof of lemma 2.1, and using once more the AR(∞) form of ( 6),equation ( 38) becomes:

εq,t = εq,t −∞∑

k=1

(αq,k − αq,k)ut−k −∞∑

k=1

(αq,k − αk)ut−k (39)

It then follows that

|εq,t − εq,t|r ≤ c

(∣

∣

∣

∣

∣

∞∑

k=1

(αq,k − αq,k)ut−k

∣

∣

∣

∣

∣

r

+

∣

∣

∣

∣

∣

∞∑

k=1

(αq,k − αk)ut−k

∣

∣

∣

∣

∣

r)

for c = 2r−1. Let us define

C1n =1

n

n∑

t=1

∣

∣

∣

∣

∣

∞∑

k=1

(αq,k − αq,k)ut−k

∣

∣

∣

∣

∣

r

C2n =1

n

n∑

t=1

∣

∣

∣

∣

∣

∞∑

k=1

(αq,k − αk)ut−k

∣

∣

∣

∣

∣

r

then showing that n1−r/2C1na.s.→ 0 and n1−r/2C2n

a.s.→ 0 will give us our result. First,let us note that C1n is majorized by:

(

max1≤k≤∞

|αq,k − αq,k|r)

1

n

n∑

t=1

∞∑

k=1

|ut−k|r (40)

By lemma 2.1 and equation (20) of Park (2002), we have

max1≤k≤∞

|αq,k − αq,k| = O(

(log n/n)1/2)

a.s.,

Thus, the first part of ( 40) goes to 0. On the other hand, the second part is boundedaway from infinity by a law of large numbers and equation (25) in Park (2002). This

46

proves the first result. If we apply Minkowski’s inequality to the absolute value partof C2n, we obtain

E

∣

∣

∣

∣

∣

∞∑

k=1

(αq,k − αk) ut−k

∣

∣

∣

∣

∣

r

≤ E |ut|r(

∞∑

k=1

|αq,k − αk|)r

(41)

of which the right hand side goes to 0 by the boundedness of E |ut|r, the defini-tion of the α, lemma 2.1 and equation (21) of Park (2002), where it is shown that∑p

k=1 |αp,k − αk| = o(p−s) for some p → ∞, which implies a similar result betweenthe πq,k and the πk, which in turn implies a similar result between the αq,k and theαk. This proves the result.

4. n1−r/2Dna.s.→ 0

In order to prove this, we show that

1

n

n∑

t=1

εq,t =1

n

n∑

t=1

εq,t + o(1) a.s. =1

n

n∑

t=1

εt + o(1) a.s.

Recalling equations ( 35) and ( 39), this will be true if

1

n

n∑

t=1

∞∑

k=q+1

(αk − αk) ut−ka.s.→ 0 (42)

1

n

n∑

t=1

∞∑

k=1

(αq,k − αk)ut−ka.s.→ 0 (43)

1

n

n∑

t=1

∞∑

k=1

(αq,k − αq,k)ut−ka.s.→ 0 (44)

where the first equation serves to prove the second asymptotic equality and the othertwo serve to prove the first asymptotic equality. Proving those 3 results requires somework. Just like Park (2002), p.485, let us define

Sn(i, j) =n∑

t=1

εt−i−j

and

Tn(i) =n∑

t=1

ut−i

so that

Tn(i) =∞∑

j=0

πjSn(i, j)

47

and remark that by Doob’s inequality,∣

∣

∣

∣

max1≤m≤n

Sm(i, j)∣

∣

∣

∣

r

≤ z |Sn(i, j)|r

where z = 1/(1 − 1r). Taking expectations and applying Burkholder’s inequality,

E(

max1≤m≤n

|Sm(i, j)|r)

≤ c1zE

∣

∣

∣

∣

∣

n∑

t=1

ε2t

∣

∣

∣

∣

∣

r/2

where c1 is a constant depending only on r. By the law of large numbers, the righthand side is equal to c1z(nσ

2)r/2=c1z(nr/2σr). Thus, we have

E(

max1≤m≤n

|Sm(i, j)

∣

∣

∣

∣

r

) ≤ cnr/2

uniformly over i and j, where c = c1zσr. Define

Ln =∞∑

k=q+1

(αk − αk)Tn(k) =∞∑

k=q+1

(αk − αk)n∑

t=1

ut−k.

It must therefore follow that[

E(

max1≤m≤n

|Lm|r)]1/r

≤∞∑

k=q+1

|(αk − αk)|[

E(

max1≤m≤n

|Tm(k)|r)]1/r

≤∞∑

k=q+1

|(αk − αk)| cn1/2

where the constant c is redefined accordingly. But

∞∑

k=q+1

|(αk − αk)| = o(q−s)

by assumption 2.1 b and the construction of the αk, recall part 2 of the present proof.Thus,

E[

max1≤m≤n

|Lm|r]

≤ cq−rsnr/2

Then it follows from the result in Moricz (1976, theorem 6), that for any δ > 0,

Ln = o(

q−sn1/2 (log n)1/r (log log n)(1+δ)/r)

= o(n) a.s.

this last equation proves ( 42).

Now, if we let

Mt =∞∑

k=1

(αq,k − αk)Tn(k) =∞∑

k=1

(αq,k − αk)n∑

t=1

ut−k,

48

then, we find by the same devise that

[

E(

max1≤m≤n

|Mm|r)]1/r

≤∞∑

k=1

|(αq,k − αk)|[

E(

max1≤m≤n

|Tm(k)|r)]1/r

the right hand side of which is smaller than or equal to cq−sn1/2, see the discussionunder equation ( 41). Consequently, using Moricz’s result once again, we have Mn =o(n) a.s. and equation ( 43) is demonstrated. Finally, define

Nn =∞∑

k=1

(αq,k − αq,k)Tn(k) =∞∑

k=1

(αq,k − αq,k)n∑

t=1

ut−k

further, let

Qn =∞∑

k=1

∣

∣

∣

∣

∣

n∑

t=1

ut−k

∣

∣

∣

∣

∣

.

Then, the larger element of Nn is Qn max1≤k≤∞ |αq,k − αq,k|. By assumption 2.1 band the result under ( 40), we know that the second part goes to 0. Then, usingagain Doob’s and Burkholder’s inequalities,

E[

max1≤m≤n

|Qm|r]

≤ cqrnr/2.

Therefore, we can deduce that for any δ > 0,

Qn = o(

qn1/2 (log n)1/r (log log n)(1+δ)/r)

a.s.

and

Nn = O(

(log n/n)1/2)

Qn = o(

q (log n)(r+2)/2r (log log n)(1+δ)/r)

= o(n).

Hence, equation ( 44) is proved.

The proof of the lemma is complete. Consequently, the conditions required for theo-

rem 2.2 of Park (2002) are satisfied and we may conclude that W?n

d?

→ W a.s.

•

Proof of Lemma 2.3.

We begin by noting that

Pr?[

max1≤t≤n

∣

∣

∣n−1/2u?t

∣

∣

∣ > δ]

≤n∑

t=1

Pr?[∣

∣

∣n−1/2u?t

∣

∣

∣ > δ]

49

= nPr?[∣

∣

∣n−1/2u?t

∣

∣

∣ > δ]

≤ (1/δr)n1−r/2E? |u?t |r

where the first inequality is trivial, the second equality follows from the invertibility ofu?

t conditional on the realization of εq,t (which implies that the AR(∞) form of theMA(q) sieve is stationary) and the last inequality is an application of the Tchebyshevinequality. Recall that

u?t =

q∑

k=1

( q∑

i=k

πq,i

)

ε?t−k+1.

Then, by Minkowski’s inequality and assumption 2.1, we have:

E? |u?t |r ≤

( q∑

k=1

k |πq,k|)r

E? |ε?t |r

But by lemma 2.1, the estimates πq,i are consistent for πk. Hence, by assumption 2.1,the first part must be bounded as n → ∞. Also, we have shown in lemma 2 thatn1−r/2E? |ε?

t |ra.s.→0. The result thus follows.

•

Proof of Theorem 2.1.

The result follows directly from combining lemmas 2.1 to 2.3. Note that, had weproved lemmas 2.1 to 2.3 in probability, the result of theorem 2.1 would also be inprobability.

•

Proof of Corollary 2.1

The proof of this corollary is a direct extension of the proof of lemma 2.1. It suffices topoint out the fact that the θ`,k can be written as functions of the π`,j and α`,j, whichcan themselves be expressed as functions of the parameters of a long autoregressionin the context of the analytical indirect inference estimation method of GZW (1997).We know, by lemma 3.1 of Park (2002), that the estimated parameters of this longautoregression are consistent estimators of the parameters of the AR(∞) form of ut.By the same arguments that were used in the proof of lemma 2.1, we can thereforeconclude that π`,j and α`,j are consistent estimators of the true parameters of the

ARMA form of ut, which in turn implies that the θ`,k are consistent.

•

50

Proof of corollary 2.2.

Recall that ` = p+ q and assume that `→ ∞ at the rate specified in assumption 2.3.Then, for any value of p ∈ [0,∞) and q → ∞, the stated result follows from lemma2.2 and corollary 2.1. To see this, define νt as being the original process ut with anAR(p) part filtered out using the p consistent parameter estimates. Then, νt is aninvertible general linear process to which we can apply lemma 2.2.

By the same logic, letting q ∈ [0,∞) and p→ ∞, and defining νt as being the originalprocess with the MA(q) part filtered out, it is easily seen that the result of lemma 3.2in Park (2002) can be applied. Since these two situations allow us to handle everypossible cases, the result is shown.

Equivalently, we could have reached the same conclusion by using the AR(∞) formof the ARMASB rather than that of the MASB in the proof of lemma 2.2.

•

Proof of corollary 2.3

Identical to corollary 2.2 except that we use lemma 2.3 for the case where q → ∞and the proof of theorem 3.3 in Park (2002) for the case where p→ ∞.

•


The result follows directly from combining corollaries 2.1 to 2.3. Note that, hadwe proved corollaries 2.1 to 2.3 in probability, the result of theorem 2.2 would also bein probability.

•

Lemma A 1.

Under assumptions 2.1 and 2.2, et = ε?t , where et is the error term from the bootstrap

ADF regression and ε?t is the bootstrap error term.

51

Proof of Lemma A1.

Let us first rewrite the ADF regression ( 21) under the null as follows:

∆y?t =

p∑

k=1

αp,k

k+q∑

j=k+1

πq,j−kε?t−j + ε?

t−k

+ et

where we have substituted the MASB DGP for ∆y?t−k. This can be rewritten as

∆y?t =

p∑

i=1

q∑

j=0

αp,iπq,jε?t−i−j + et. (45)

where π0,j is constrained to be equal to 1 as usual. Then, from equations ( 45) and( 7),

et =q∑

j=1

πq,jε?t−j + ε?

t −p∑

i=1

q∑

j=0

αp,iπq,jε?t−i−j.

Let us rewrite this result as follows:

et = At +Bt + ε?t

where

At = (πq,1−α1)ε?t−1 +(πq,2−αp,1πq,1−αp,2)ε

?t−2 + ...+(πq,q −αp,1πq,q−1− ...−αp,q)ε

?t−q

and

Bt =p+q∑

j=q+1

ε?t−j

( q∑

i=0

πq,iαp,j−i

)

where π0 = 1 and αp,i = 0 whenever i > p. First, we note that the coefficientsappearing in At are the formulas linking the parameters of an AR(p) regression tothose of a MA(q) process (see Galbraith and Zinde-Walsh, 1994). Hence, no matterhow the πq,j are estimated, these coefficients all equal to zero asymptotically underassumption 2.2. Since we assume that q → ∞, we may conclude that At →0. Further,by lemma 2.1 and lemma 3.1 of Park (2002), this convergence is almost sure.

On the other hand, taking Bt and applying Minkowski’s inequality to it:

E? |Bt|r ≤ E? |ε?t |r

p+q∑

j=q+1

q∑

i=0

|πq,iαp,j−i|

r

.

But for each j,∑q

i=0 πq,iαp,j−i is equal to -αp,j, the jth parameter of the approximatingautoregression (see Galbraith and Zinde-Walsh, 1994). Hence, we can write:

E? |Bt|r ≤ E? |ε?t |r

p+q∑

j=q+1

|αj |

r

a.s.

52

by lemma 2.1. But, as p and q go to infinity with p > q,∑p+q

j=q+1(|αj|)r = o(q−rs) a.s.under assumption 2.1.

For the ARMASB, the DGP is

∆y?t =

p∑

j=1

αp,j∆y?t−j +

q∑

j=1

πq,jε?t−j + ε?

t

so that the bootstrap ADF regression can be written as follows

∆y?t =

∑

i=1

γ`,i

q∑

j=0

πq,jε?t−i−j +

p∑

j=0

αp,j∆y?t−i−j

+ et

thus,

et =p∑

j=1

αp,j∆y?t−j +

q∑

j=1

πq,jε?t−j −

∑

i=1

q∑

j=0

γ`,iπq,jε?t−i−j −

∑

i=1

p∑

j=0

γ`,iαp,j∆y?t−i−j + ε?

t .

By calculation similar to those used above, we have:

et = At +Bt

where

Bt =∞∑

j=`+1

∆y?t−j

( q∑

i=1

πq,jγj−i

)

and

At =∑

j=1

ψj∆y?t−j

whereψ1 = πq,1 + αp,1 − γ`,1

ψ2 = αp,2 + πq,2 − πq,1γ1 − γ`,2

or, more generally,

ψj = −min(j,q)∑

i=1

πi,qγj−i + αp,j − γ`,j for j ≤ p

ψj = −min(j,q)∑

i=1

πi,qγj−i − γ`,j for j > p.

where γk denotes the true value of the AR(∞) form of ∆y?t . These are just the analyti-

cal indirect inference binding functions linking the parameters of a long autoregressive

53

approximation to those of an ARMA(p,q) model. Because the analytical indirect in-ference estimator is consistent, At goes to 0 asymptotically under assumption 3.2.Now, applying Minkowsky’s inequality to Bt yields:

E? |Bt|r ≤ E? |∆y?t |r

∞∑

j=`+1

q∑

i=1

|πq,jγj−i|

r

where we have used the fact that the bootstrap DGP is stationary. For each j,∑q

i=1 πq,jγj−i is equal to γj, the jth parameter in the AR(∞) form of ∆y?t which in

turn is a consistent estimate of γ0j , the jth parameter of the AR(∞) form of ∆yt.

Thus,

E? |Bt|r ≤ E? |∆y?t |r

∞∑

j=`+1

∣

∣

∣γ0j

∣

∣

∣

r

a.s.

The right hand side evidently goes to 0 by assumption 2.1. this concludes the proof.

•

Lemma A2 (CP lemma A1).

Under assumptions 2.1 and 3.2’, we have σ2?

a.s.→ σ2 and Γ?0

a.s.→ Γ0 as n → ∞ whereE?|ε?

t |2 = σ2? and E?|u?

t |2 = Γ?0

Proof of Lemma A2.

Consider the MASB DGP ( 7) once more:

u?t =

q∑

k=1

πq,kε?t−k + ε?

t (46)

Under assumption 2.1 and given lemma 2.1, this process admits an AR(∞) represen-tation. Let this be:

u?t +

∞∑

k=1

ψq,ku?t−k = ε?

t (47)

where we write the ψq,k parameters with a hat and a subscript q to emphasize thatthey come from the estimation of a finite order MA(q) model. We can rewrite equation( 47) as follows:

u?t = −

∞∑

k=1

ψq,ku?t−k + ε?

t . (48)

Multiplying by u?t and taking expectations under the bootstrap DGP, we obtain

Γ?0 = −

∞∑

k=1

ψq,kΓ?k + σ2

? .

54

dividing both sides by Γ?0 and rearranging,

Γ?0 =

σ2?

1 +∑∞

k=1 ψq,kρ?k

(49)

where ρ?k are the autocorrelations of the bootstrap process. Note that these are

functions of the parameters ψq,k and that it can easily be shown that they satisfy thehomogeneous system of linear differential equations described as:

ρ?h +

∞∑

k=1

ψq,kρ?h−k = 0

for all h > 0. Thus, the autocorrelations ρ?h are implicitly defined as functions of the

ψq,k. On the other hand, let us now consider the model:

ut =q∑

k=1

πq,kεq,t−k + εq,t (50)

which is simply the result of the computation of the parameters estimates πq,k. This,of course, also has an AR(∞) representation:

ut = −∞∑

k=1

ψq,kut−k + εq,t.

where the parameters ψq,k are exactly the same as in equation ( 48). Applying thesame steps to this new expression, we obtain:

Γ0,n =σ2

n

1 +∑∞

k=1 ψq,kρk,n

(51)

where Γ0,n, is the sample autocovariance of ut when we have n observations, σ2n =

(1/n)∑n

i=1 ε2t and ρk,n is the kth autocorrelation of ut. Since the autocorrelation

parameters are the same in equations ( 49) and ( 51), we can write:

Γ?0 = (σ2

?/σ2n)Γ0,n.

The strong law of large numbers implies that Γ0na.s.→ Γ0. Therefore, we only need to

show that σ2?/σ

2n

a.s.→ 1 to obtain the second result (that is, to show that Γ?0

a.s.→ Γ0).By the consistency results in lemma 2.1, we have that σ2

na.s.→ σ2. Also, recall that the

ε?t are drawn from the EDF of (εq,t − (1/n)

∑nt=1 εq,t). Therefore,

σ2? =

1

n

n∑

t=1

(

εq,t −1

n

n∑

t=1

εq,t

)2

.

By the independence of the ε?t , It follows that:

σ2? = σ2

n +

(

1

n

n∑

t=1

εq,t

)2

. (52)

55

But we have shown in lemma 2 that ((1/n)∑n

t=1 εq,t)2 = o(1) a.s. (to see this, take

the result for n1−r/2Dn with r = 2). Therefore, we have that σ2?

a.s.→ σ2n, and thus,

σ2?/σ

2n

a.s.→1. It therefore follows that Γ?0

a.s.→ Γ0. On the other hand, σ2?

a.s.→ σ2n implies

σ2?

a.s.→ σ2.

It is fairly obvious that the results of this lemma can be extended to the ARMASBcase. To do this, it suffices to replace equations ( 46) and ( 50) by the ARMASBDGP and the ARMA estimating equation and to use the corollaries to lemmas 2.1and 2.2.

•

Lemma A3 (CP lemma A2, Berk, theorem 1, p. 493)

Let f and f ? be the spectral densities of ut and u?t respectively. Then, under assump-

tions 2.1 and 3.2’,sup

λ|f ?(λ) − f(λ)| = o(1) a.s.

for large n. Also, letting Γk and Γ?k be the autocovariance functions of ut and u?

t

respectively, we have∞∑

k=−∞

Γ?k =

∞∑

k=−∞

Γk + o(1) a.s.

for large n. Notice that the result of CP (2003) and ours are almost sure whereasBerk’s is only in probability.

Proof of Lemma A3.

Let us first derive the result for the MASB. The spectral density of the bootstrapdata is

f ?(λ) =σ2

?

2π

∣

∣

∣

∣

∣

1 +q∑

k=1

πq,keikλ

∣

∣

∣

∣

∣

2

.

Further, let us define

ˆf(λ) =σ2

n

2π

∣

∣

∣

∣

∣

1 +q∑

k=1

πq,keikλ

∣

∣

∣

∣

∣

2

.

Recall that

σ2? =

1

n

n∑

t=1

[

εq,t −1

n

n∑

t=1

εq,t

]2

.

From lemma 2.2 (proof of the 4th part) and lemma A2 (equation 52), we have

σ2? =

1

n

n∑

t=1

ε2q,t + op(1).

56

Thus,

f ?(λ) = ˆf(λ) + op(1).

Therefore, the desired result follows if we show that

supλ

∣

∣

∣

ˆf(λ) − f(λ)∣

∣

∣ = o(1) a.s.

Now, denote by fn(λ) the spectral density function evaluated at the pseudo-trueparameters introduced in the proof of lemma 2.1:

fn(λ) =σ2

n

2π

∣

∣

∣

∣

∣

1 +q∑

k=1

πq,keikλ

∣

∣

∣

∣

∣

2

where σ2n is the minimum value of

∫ π

−πf(λ)

∣

∣

∣

∣

∣

1 +q∑

k=1

πq,keikλ

∣

∣

∣

∣

∣

−2

dλ

and σ2n → σ2 as shown in Baxter (1962). Obviously,

supλ

∣

∣

∣

ˆf(λ) − fn(λ)∣

∣

∣ = o(1) a.s.

by lemma 2.1 and equation (20) of Park (2002). Also,

supλ

|fn(λ) − f(λ)| = o(1) a.s.

where

f(λ) =σ2

2π

∣

∣

∣

∣

∣

1 +q∑

k=1

πkeikλ

∣

∣

∣

∣

∣

2

.

by the same argument we used at the end of part 3 of the proof of lemma 2.2. Thefirst part of the present lemma therefore follows. If we consider that

∞∑

−∞

Γk = 2πf(0) and∞∑

−∞

Γ?k = 2πf ?(0)

the second part follows directly.

For the ARMASB, the spectral density is

f ?(λ) =σ2

?

2π

∣

∣

∣

∣

∣

1 +∞∑

k=1

θ`,keikλ

∣

∣

∣

∣

∣

2

.

where θ`,k is the `th parameter of the MA(∞) form of the ARMASB DGP. Sincelemmas 2.1, 2.2 and A2 also hold for the ARMASB, the proof follows the same line

57

as that of the MASB and we therefore do not write it down. More specifically, it iseasy to see that the result n1−r/2Dn

a.s.→ 0 in part four of lemma 2.2, which is of crucialimportance here, holds if we replace the parameters of the AR(∞) form of the MASBby those corresponding to the AR(∞) form of the ARMASB.

•

Lemma A4 Under assumptions 2.1 and 2.2, we have

E? |ε?t |4 = O(1) a.s.

Proof of Lemma A4.

From the proof of lemma 2.2, we have E? |ε?t |4 ≤ c (An +Bn + Cn +Dn) where c is a

constant. The relevant results are:

1. An = O(1) a.s.

2. E(Bn) = o(q−rs) (equation 34)

3. Cn ≤ 2r−1(C1n + C2n)

where C1n = o(1) a.s. (equation above 41)

and E(C2n) = o(q−rs) (equation 41)

4. Dn = o(1) a.s.

Under assumption 2.2’, we have that Bn = o(1) a.s. and C2n = o(1) a.s. becauseo(q−rs) = o((cnk)−rs) = o(n−krs) = o(n−1−δ) for δ > 0. The result therefore followsfor both the MASB and the ARMASB.

•

Lemma A5 (CP lemma A4, Berk proof of lemma 3)

Define

M?n(i, j) = E?

[

n∑

t=1

(

u?t−iu

?t−j

)

− Γ?i−j

]2

.

Then, under assumptions 2.1 and 2.2, we have M?n(i, j) = O(n) a.s.

58

Proof of Lemma A5.

For general linear models, Hannan (1960, p. 39) and Berk (1974, p. 491) have shownthat

M?n(i, j) ≤ n

2∞∑

k=−∞

Γ?k + |K?

4 |(

∞∑

k=0

π2q,k

)2

for all i and j and where K?4 is the fourth cumulant of ε?

t . Since our MASB andARMASB certainly fit into the class of linear models, this result applies here. ButK?

4 can be written as a polynomial of degree 4 in the first 4 moments of ε?t . Therefore,

|K?4 | must be O(1) a.s. by lemma A4. The result now follows from lemma A3 and

the fact that∑∞

k=−∞ Γk = O(n).

•

Before going on, it is appropriate to note that the proofs of lemmas 2.4 and 2.5 arealmost identical to the proofs of lemma 3.2 and 3.3 of CP (2003). We present themhere for the sake of completeness.

Proof of Lemma 2.4.

First, we prove equation ( 22). Using the Beveridge-Nelson decomposition of u?t and

the fact that y?t =

∑tk=1 u

?k, we can write:

1

n

n∑

t=1

y?t ε

?t = πn(1)

1

n

n∑

t=1

w?t−1ε

?t + u?

0

1

n

n∑

t=1

ε?t −

1

n

n∑

t=1

u?t−1ε

?t .

Therefore, to prove the first result, it suffices to show that

E?

[

u?0

1

n

n∑

t=1

ε?t −

1

n

n∑

t=1

u?t−1ε

?t

]

= o(1) a.s. (53)

Since the ε?t are iid by construction, we have:

E?

(

n∑

t=1

ε?t

)2

= nσ2? = O(n) a.s (54)

and

E?

(

n∑

t=1

u?t−1ε

?t

)2

= nσ2?Γ

?0 = O(n) a.s (55)

where Γ?0=E?(u?

t )2. But the terms in equation ( 53) are 1

ntimes the square root of

( 54) and ( 55). Hence, equation ( 53) follows. Now, to prove equation ( 23), recall

59

that w?t =

∑tk=1 ε

?k and consider again from the Beveridge-Nelson decomposition of

u?t :

y?t2 = πn(1)2w?

t2 + (u?

0 − u?t )

2 + 2πn(1)w?t (u

?0 − u?

t )

= πn(1)2w?t2 + (u?

0)2 + (u?

t )2 − 2u?

tu?0 + 2πn(1)w?

t (u?0 − u?

t )

thus, 1n2

∑nt=1 y

?t2 is equal to

πn(1)2 1

n2

n∑

t=1

w?t2+

1

n2(u?

0)2+

1

n2

n∑

t=1

(u?t )

2− 2

n2u?

0

n∑

t=1

u?t+2πn(1)u?

0

1

n2

n∑

t=1

w?t +2πn(1)

1

n2

n∑

t=1

w?t u

?t .

By lemma 2.3, every term but the first of this expression is o(1) a.s. The resultfollows.

•

Proof of Lemma 2.5.

Using the definition of bootstrap stochastic orders, we prove the results by showingthat:

E?

∥

∥

∥

∥

∥

∥

(

1

n

n∑

t=1

x?p,tx

?p,t

>

)−1∥

∥

∥

∥

∥

∥

= Op(1) (56)

E?

∥

∥

∥

∥

∥

n∑

t=1

x?p,ty

?t−1

∥

∥

∥

∥

∥

= Op(np1/2) a.s. (57)

E?

∥

∥

∥

∥

∥

n∑

t=1

x?p,tε

?t

∥

∥

∥

∥

∥

= Op(n1/2p1/2) a.s. (58)

The proofs below rely on the fact that, under the null, the ADF regression is a finiteorder autoregressive approximation to the bootstrap DGP, which admits an AR(∞)form.

Proof of ( 56): First, let us define the long run covariance of the vector x?p,t as

Ω?pp = (Γ?

i−j)pi,j=1. Then, recalling the result of lemma A5, we have that

E?

∥

∥

∥

∥

∥

1

n

n∑

t=1

x?p,tx

?p,t

> − Ω?pp

∥

∥

∥

∥

∥

2

= Op(n−1p2) a.s. (59)

This is because equation ( 59) is squared with a factor of 1/n and the dimension ofx?

p,t is p. Also,∥

∥

∥Ω?pp

−1∥

∥

∥ ≤[

2π(infλf ?(λ))

]−1

= O(1) a.s. (60)

because, under lemma A4, we can apply the result from Berk (1974, equation 2.14).In that paper, Berk considers the problem of approximating a general linear process,

60

of which our bootstrap DGP can be seen as being a special case, with a finite orderAR(p) model, which is what our ADF regression does under the null hypothesis. Tosee how his results apply, consider assumption 2.1 (b) on the parameters of the originaldata’s DGP. Using this and the results of lemma 2.1, we may say that

∑∞k=0 |πq,k| <∞.

Therefore, as argued by Berk (1974, p. 493), the polynomial 1 +∑q

k=1 πq,keikλ is con-

tinuous and nonzero over λ so that f?(λ) is also continuous and there are constantvalues F1 and F2 such that 0 < F1 < f ?(λ) < F2. This further implies that (Grenan-der and Szego 1958, p. 64) 2πF1 ≤ λ1 < ... < λk ≤ 2πF2 where λi, i=1,...,p are theeigenvalues of the theoretical covariance matrix of the bootstrap DGP. To get theresult, it suffices to consider the definition of the matrix norm. For a given matrixC, we have that ‖C‖=sup ‖Cx‖ for ‖x‖ ≤ 1 where x is a vector and ‖x‖2 = x>x.Thus, ‖C‖2 ≤ ∑

i,j c2i,j, where ci,j is the element in position (i, j) in C. Therefore, the

matrix norm ‖C‖ is dominated by the largest modulus of the eigenvalues of C. Thisin turn implies that the norm of the inverse of C is dominated by the inverse of thesmallest modulus of the eigenvalues of C. This reasonning is similar to that of Berk(1974, p. 492-93). Hence, equation ( 60) follows.

Then, we write the following inequality:

∣

∣

∣

∣

∣

E?

∥

∥

∥

∥

∥

1

n

n∑

t=1

x?p,tx

?p,t

>

∥

∥

∥

∥

∥

−∥

∥

∥Ω?pp

−1∥

∥

∥

∣

∣

∣

∣

∣

≤ E?

∥

∥

∥

∥

∥

∥

(

1

n

n∑

t=1

x?p,tx

?p,t

>

)−1

− Ω?pp

−1

∥

∥

∥

∥

∥

∥

where we used the fact that E?∥

∥

∥Ω?pp

∥

∥

∥ =∥

∥

∥Ω?pp

∥

∥

∥. By equation ( 59), the right hand side

goes to 0 as n increases. Equation ( 56) then follows from equation ( 60).

Proof of ( 57): Our proof is almost exactly the same as the proof of lemma 3.2 inChang and Park (2002), except that we consider bootstrap quantities. As them, welet y?

t = 0 for all t ≤ 0 and, for 1 ≤ j ≤ p, we write

n∑

t=1

y?t−1u

?t−j =

n∑

t=1

y?t−1u

?t +R?

n (61)

where

R?n =

n∑

t=1

y?t−1u

?t−j −

n∑

t=1

y?t−1u

?t

where u?t is the bootstrap first difference process. First of all, we note that

∑nt=1 x

?p,ty

?t−1

is a 1 × p vector whose jth element is∑n

t=1 y?t−1u

?t−j. Therefore, by the definition of

the Eucledian vector norm, equation ( 57) will be proved if we show that R?n = O?

p(n)uniformly in j for j from 1 to p. We begin by noting that

n∑

t=1

y?t−1u

?t =

n∑

t=1

y?t−j−1u

?t−j −

n∑

t=n−j+1

y?t−1u

?t

61

for each j. This allows us to write:

R?n =

n∑

t=1

(y?t−1 − y?

t−j−1)u?t−j −

n∑

t=n−j+1

y?t−1u

?t .

Let us call R?1n and R?

2n the first and second elements of the right hand side of thislast equation. Then, because y?

t is integrated,

R?1n =

n∑

t=1

(

y?t−1 − y?

t−j−1

)

u?t−j

=n∑

t=1

j∑

i=1

u?t−i

u?t−j

= n

j∑

i=1

Γ?i−j

+j∑

i=1

[

n∑

t=1

(

u?t−iu

?t−j − Γ?

i−j

)

]

where we have added and subtracted Γ?ij. By lemma A3, the first part is O(n) a.s.

because∑∞

−∞ Γk = O(1). Similarly, we can use lemma A5 to show that the second

part is O?p(n

1/2p) a.s., where the change to O?p comes from the fact that the result

of lemma A5 is for the expectation under the bootstrap DGP, the n1/2 from the factthat lemma A5 considers the square of the present term and p from the fact that jgoes from 1 to p. Thus, R?

1n = O(n) +O?p(n

1/2p) a.s.

Now, let us consider R?2n:

R?2n =

n∑

t=n−j+1

y?t−1u

?t

=n∑

t=n−j+1

(

t−1∑

i=1

u?t−i

)

u?t

=n∑

t=n−j+1

n−j∑

i=1

u?tu

?t−i +

n∑

t=n−j+2

t−1∑

i=n−j+1

u?tu

?t−i

Letting R?2n

a and R?2n

b denote the first and second part of this last equation, we have

R?2n

a = j

n−j∑

i=1

Γ?i

+n∑

t=n−j+1

n−j∑

i=1

(u?tu

?t−i − Γ?

i )

= O(p) +O?p(n

1/2p) a.s.

62

uniformly in j, where the last line comes from lemmas A3 and A5 again. The orderof the first term is obvious while the order of the second term is explained like thatof the second term of R?

1n. Similarly, we also have

R?2n

b = (j − 1)t−1∑

n−j+1

Γi +n∑

t=n−j+2

t−1∑

i=n−j+1

(u?tu

?t−i − Γ?

i )

= O(p) +Op(p3/2) a.s.

uniformly in j under lemmas A3 and A5 and where the order of the first term isobvious and the order of the second term comes from the fact that j appears in thestarting index of both summand. Hence, R?

n is O?p(n) a.s. uniformly in j. Also, under

assumptions 2.1 and 2.2, and by lemma 2.1,∑n

t=1 y?t−1u

?t = O?

p(n) a.s. uniformly inj. Therefore, equation ( 61) is also O?

p(n) a.s. uniformly in j, 1 ≤ j ≤ p. The resultfollows because the left hand side of ( 57) is the Euclidian vector norm of p elementsthat are all O?

p(n) a.s.

Proof of ( 58): We begin by noting that for all k such that 1≤ k ≤ p,

E?

(

n∑

t=1

u?t−kε

?t

)2

= nσ2?Γ

?0

which means that

E?

∥

∥

∥

∥

∥

n∑

t=1

x?p,tε

?t

∥

∥

∥

∥

∥

2

= npσ2?Γ

?0.

But it has been show in lemma A2 that σ2? and Γ?

0 are O(1) a.s. The result is thenobvious.

•


The theorem follows directly from lemmas 2.4 and 2.5.

•

Proof of Corollary 2.4.

The results are easily obtained by using the Beveridge-Nelson decomposition of theinfinite MA form of the ARMASB DGP and by following the same line of reasoningas in lemma 2.4.

•

63

Proof of corollary 2.5.

The proof of lemma 2.5 can easily be adapted to the ARMASB since lemmas 2.1, 2.2and A1 to A5 have been shown to hold for this case. Hence, corollary 2.5 naturallyfollows.

•

Proof of Theorem 2.4

The proof follows directly from corollaries 2.4 and 2.5.

•

References

Abadir, K. M. (1995). ”The limiting distribution of the t ratio under a unit root,”Econometric Theory, 11, 775-93.

Agiakoglou, C. and P. Newbold (1992). ”Empirical evidence on Dickey-Fuller typetests,” Journal of Time Series Analysis, 13, 471-83.

An, H. Z., G. Z. Chan and E. J. Hannan (1982). ”Autocorrelation, autoregressionand autoregressive approximation,” Annals of Statistics, 10, 926-36.

Andrews, D. W. K. (2004). ”The block block bootstrap: improved asymptotic refine-ments,” Econometrica, 72, 673-700.

Baxter, G. (1962). ”An asymptotic result for the finite predictor,” Mathematica

Scandinavia, 10, 137-44.

Berk, K. N. (1974). ”Consistent autoregressive spectral estimates.” Annals of Sta-

tistics, 2, 489-502.

Bickel, P. J. and P. Buhlmann (1999). ”A new mixing notion and functional centrallimit theorems for a sieve bootstrap in time series,” Bernouilli, 5, 413-46.

Bierens, H. J. (2001). ”Unit roots,” Ch. 29 in A Companion to Econometric

Theory, ed. B. Baltagi, Oxford, Blackwell Publishers, 610-33.

Buhlmann, P. (1997). ”Sieve bootstrap for time series,” Bernoulli, 3, 123-48.

64

Buhlmann, P. (1998). ”Sieve bootstrap for smoothing in nonstationarity time series,”Annals of Statistics, 26, 48-83.

Chang, Y. and J. Y. Park (2002). ”On the asymptotics of ADF tests for unit roots,”Econometric Reviews, 21, 431-47.

Chang, Y. and J. Y. Park (2003). ”A sieve bootstrap for the test of a unit root,”Journal of Time Series Analysis, 24, 379-400.

Davidson, J. E. H. (2006). ”Asymptotic methods and functional central limit theo-rems,” ch. in Palgrave Handbooks of Econometrics, eds. T. C. Mills and K.Patterson, Palgrave-Macmillan.

Davidson, R. J. G. MacKinnon (1998). ”Graphical methods for investigating the sizeand power of hypothesis tests,” The Manchester School, 66, 1-26.

Davidson, R. J. G. MacKinnon (2004). ”Econometric Theory and Methods,” Oxford,Oxford University Press.

Davidson, R. J. G. MacKinnon (2006). ”Improving the reliability of bootstrap tests,”Working paper, Queen’s and McGill Universities.

Dickey, D. A. and W. A. Fuller (1979). ”Distribution of the estimators for autore-gressive time series with a unit root,” Journal of the American Statistical As-

sociation, 74, 427-31.

Dickey, D. A. and W. A. Fuller (1981). ”Likelihood ratio statistics for autoregressivetime series with a unit root,” Econometrica, 49, 1057-72.

Diebold, F. X. and G. Rudebush (1989). ”Long memory and persistence in aggregateoutput,” Journal of Monetary Economics, 24, 189-209.

Diebold, F. X. and G. Rudebush (1991). ”Is consumption too smooth? Long memoryand the Deaton paradox,” Review of Economics and Statistics, 74, 1-9.

Dufour, J.M. and J. Kiviet (1998). ”Exact inference methods for first order autore-gressive distributed lag models,” Econometrica, 66, 79-104.

Galbraith, J. W. and V. Zinde-Walsh (1992). ”The GLS transformation matrix and asemi-recursive estimator for the linear regression model with ARMA errors,” Econo-

metric Theory, 8, 143-55.

Galbraith, J. W. and V. Zinde-Walsh (1994). ”A simple noniterative estimator formoving average models,” Biometrika, 81, 143-55.

65

Galbraith, J. W. and V. Zinde-Walsh (1997). ”On some simple, autoregression-basedestimation and identification techniques for ARMA models,” Biometrika, 84, 685-96.

Goncalves and Vogelsang (2006). ”Block Bootstrap Puzzles in HAC Robust Testing:The Sophistication of the Naive Bootstrap,” Working paper, Universite de Montreal.

Granger, C. W. J. (1980). ”Long memory relationships and the aggregation of dy-namic models,” Journal of Econometrics, 14, 227-38.

Grenander, U and G. Szego (1958). ”Toeplitz forms and their applications,” Berkeley,University of California Press.

Hall, P., J. L. Horowitz (1996). ”Bootstrap critical values for tests based on gen-eralisez method of moments estimators with dependent data,” Econometrica, 64,891-916.

Hall, P., J. L. Horowitz and B. Y. Jing (1995). ”On blocking rules for the bootstrapwith dependent data,” Biometrika, 82, 561-74.

Hannan, E. J. and L. Kavalieris (1986). ”Regression, autoregression models,” Jour-

nal of Time Series Analysis, 7, 27-49.

Kreiss, J. P. (1992). ”Bootstrap procedures for AR(∞) processes,” in Bootstrapping

and Related Techniques, Lecture Notes in Economics and Mathematical

Systems 376, ed. K. H. Jockel, G. Rothe and W. Senders, Heidelberg, Springer.

MacKinnon, J. G. and A. A. Smith Jr. (1998). ”Approximate bias correction ineconometrics,” Journal of Econometrics, 85, 205-30.

Moricz, F. (1976). ”Moment inequalities and the strong law of large numbers,”Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebeite, 35,298-314.

Ng. S. and P. Perron (1995). ”Unit root tests in ARMA models with data depen-dent methods for the selection of the truncation lag,” Journal of the American

Statistical Association 90, 268-81.

Ng. S. and P. Perron (1995). ”Lag length selection and the construction of unit roottests with good size and power,” Econometrica, 69, 1519-54.

Palm, F. C., S. Smeekes and J. P. Urbain (2006). ”Bootstrap unit root tests: com-parison and extensions,” working paper, Universiteit Maastricht.

Park, J. Y. (2002). ”An invariance principle for sieve bootstrap in time series,”

66

Econometric Theory, 18, 469-90.

Park, J. Y. (2003). ”Bootstrap Unit root tests,” Econometrica, 71, 1845-95.

Parker, C., Paparoditis, E. and D. N. Politis (2006). ”Unit root testing via thestationary bootstrap,” Journal of Econometrics, forthcoming.

Paparoditis, E. and D. N. Politis (2001a). ”Tapered block bootstrap,” Biometrika,88, 1105-19.

Paparoditis, E. and D. N. Politis (2001b). ”The tapered block bootstrap for generalstatistics from stationary sequences,” Econometrics Journal, 5, 131-48.

Paparoditis, E. and D. N. Politis (2003). ”Residual based block bootstrap for unitroot testing,” Econometrica, 71, 813-55.

Philips, P.C.B. and V. Solo (1992). ”Asymptotics for linear processes,” Annals of

Statistics, 20, 971-1001.

Porter-Hudak, S. (1990). ”An application of the seasonal fractional difference modelto the monetary aggregates,” Journal of the American Statistical Associa-

tion, 85, 338-44.

Psaradakis, Z. (2001). ”Bootstrap tests for a autoregressive unit root in the presenceof weakly dependent errors,” Journal of Time Series Analysis, 22, 577-94.

Richard, P. (2007). ”A modified fast double bootstrap for unit root tests,” Manu-

script, GREDI, Universite de Sherbrooke.

Said, E. S. and D. A. Dickey (1984). ”Testing for unit roots in autoregressive-movingaverage models of unknown order,” Biometrika, 71, 599-607.

Said, E. S. and D. A. Dickey (1985). ”Hypothesis testing in ARIMA(p,1,q) models,”Journal of the American Statistical Association, 80, 369-374.

Sakhanenko, A. I. (1980). ”On unimprovable estimates of the rate of convergencein invariance principle,” in Nonparametric Statistical Inference, Colloquia

Mathematica Societatis Janos Bolyai, 32, 779-83

Swensen, A. R. (2003). ”Bootstrapping unit root tests for integrated processes,”Journal of Time Series Analysis, 24, 99-126.

Weierstrass, K. (1903). ”Neuer beweis der satzer, das jede ganze rationale funktioneiner verandserlicher dargestellt werden kann als ein produkt aus linearen funktionenderselben verandserlicher,” Gesammelte Werke, 3, 251-69.

67

Documents

gredi.recherche.usherbrooke.cagredi.recherche.usherbrooke.ca/wpapers/GREDI-0705.pdf · Sieve Bootstrap Unit Root Tests Patrick Richard GREDI and D´epartment de sciences ´economiques