21
Chapter 1. GMM: Basic Concepts Contents 1 Motivating Examples 1 1.1 Instrumental variable estimator ....................... 1 1.2 Estimating parameters in monetary policy rules .............. 2 1.3 Estimating the parameter of risk aversion ................. 4 2 Denition 5 3 Global and local identication 6 3.1 Global identication ............................. 6 3.2 Local identication .............................. 7 3.3 Identied, but only weakly .......................... 9 4 Estimation and inference in well identied models 9 4.1 The asymptotic distribution ......................... 9 4.2 E¢ cient GMM ................................ 10 4.3 Two-step and continuous updating GMM ................. 11 5 Testing parametric restrictions in well identied models 11 5.1 Wald Test ................................... 12 5.2 Gradient test ................................. 13 5.3 Distance test ................................. 14 6 Model diagnostics 14 6.1 Formulating model diagnostics as testing for parametric restrictions .. 14 6.2 Testing overidentifying restrictions ..................... 17 6.3 Hausmans Specication Test ........................ 17

Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Chapter 1. GMM: Basic Concepts

Contents

1 Motivating Examples 11.1 Instrumental variable estimator . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Estimating parameters in monetary policy rules . . . . . . . . . . . . . . 2

1.3 Estimating the parameter of risk aversion . . . . . . . . . . . . . . . . . 4

2 De�nition 5

3 Global and local identi�cation 63.1 Global identi�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Local identi�cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Identi�ed, but only weakly . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Estimation and inference in well identi�ed models 94.1 The asymptotic distribution . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.2 E¢ cient GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.3 Two-step and continuous updating GMM . . . . . . . . . . . . . . . . . 11

5 Testing parametric restrictions in well identi�ed models 115.1 Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.2 Gradient test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.3 Distance test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Model diagnostics 146.1 Formulating model diagnostics as testing for parametric restrictions . . 14

6.2 Testing overidentifying restrictions . . . . . . . . . . . . . . . . . . . . . 17

6.3 Hausman�s Speci�cation Test . . . . . . . . . . . . . . . . . . . . . . . . 17

Page 2: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

1. Motivating Examples

1.1. Instrumental variable estimator

Consider the following linear model with endogeneity

yt = x0t�0 + "t , with E(xt"t ) 6= 0: (1)

Suppose zt is a set of valid instruments:

E(zt"t ) = 0:

For now, assume dim(zt) = dim(xt). Multiply both sides of the regression equation by

zt

ztyt = ztx0t�0 + zt"t:

Taking expectation:

E(ztyt) = E(ztx0t)�0 (because E(zt"t) = 0):

Assuming E(ztx0t) is invertible, then

�0 =�E(ztx

0t)��1

E(ztyt)

Replacing the two expectations with their sample estimates

�IV = (TXt=1

ztx0t)�1

TXt=1

ztyt:

In matrix notation

�IV = (Z0X)�1Z 0y:

� Question: What are examples of yt; xt and zt in macroeconomics? Can the aboveidea be generalized to nonlinear models allowing dim(zt) 6= dim(xt)? We �rst illustratethese two issues using examples, then present a formal framework.

1

Page 3: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

1.2. Estimating parameters in monetary policy rules

Clarida, Gali and Gertler (2000) estimated a forward-looking monetary policy reaction

function for the postwar United States economy. We use their study to illustrate how

moment conditions naturally arise in rational expectations models.

Let r�t denote the target rate for the nominal Federal Funds rate in period t. The

target rate in each period is a function of the gaps between expected in�ation and output

and their respective target levels. Speci�cally

r�t = r� + � (E [�t+1jt]� ��) + E [xt+1jt] ; (2)

where

� �t+1 denotes in�ation, i.e., percentage change in the price level between time tand t+ 1;

� �� is the target rate for in�ation;

� xt+1 is the output gap, de�ned as the percent deviation between actual GDP andthe corresponding target;

� t is the information set of the agent at time t when the interest rate is set;

� r� is the desired interest rate when in�ation and output are at their target levels.

De�ne the ex ante real interest rate as

rr�t = r�t � E [�t+1jt] .

Its target rate is

rr� = r� � ��:

Then, the reaction function (2) can be represented as

rr�t = rr� + (� � 1) (E [�t+1jt]� ��) + E [xt+1jt] :

2

Page 4: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

In the above, � < 1 implies that the ex ante real rate falls with higher expected in�ation;

� > 1 implies the opposite. The latter policy rule is often said to be stabilizing.

Distinguishing between these two cases is of substantial importance.

In practice, it may take more than one period for the interest rate to adjust toward

its target. This is called interest rate smoothing. Clarida, Gali and Gertler (2000)

modeled this as

rt = �rt�1 + (1� �) r�t :

This leads to the following policy reaction function

rt = (1� �) [rr� � (� � 1)�� + ��t+1 + xt+1] + �rt�1 + �t;

where

�t = � (1� �) f� [�t+1 � E (�t+1jt)] + [xt+1 � E (xt+1jt)]g :

The term in curly brackets is a linear combination of forecast errors and thus orthogonal

to any variable in t. The orthogonality is what delivers the desired moment restrictions,

as seen below.

Let zt denote a vector of instruments known at time t (i.e., contained in t). The

above two equations then imply the following set of orthogonality conditions

E f[rt � (1� �) [rr� � (� � 1)�� + ��t+1 + xt+1]� �rt�1] ztg = 0: (3)

In Clarida, Gali and Gertler (2000), zt includes the Funds rate, in�ation, output gap,

M2 growth, the spread between the long-term bond rate and three-month Treasury Bill

rate. Clearly, such a choice involves some arbitrariness. This is hard to avoid in practice.

Note that rr� and (��1)�� are not separately identi�able. Clarida, Gali and Gertler(2000) assume rr� is known and set it to the observed sample average. The equation

(3) then has four unknown parameters, �; �; �� and .

Clarida, Gali and Gertler (2000) �nd that � is greater than one for the Volcker-

Greenspan period but less than 1 for the Pre-Volcker period (see Table II in their

paper). They conclude that the monetary policy is better managed in the Volcker-

Greenspan period. However, their study has been harshly criticized subsequently, most

3

Page 5: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

notably by Cochrane (2011). The latter paper argues that the Taylor rule is in general

not identi�able if one allows for multiple equilibria, making conventional inferential

procedure invalid.

1.3. Estimating the parameter of risk aversion

Suppose a representative agent solves the following problem:

maxfCtg

E

" 1Xt=0

�tU(Ct)

#(4)

under the budget constraint

Ct + PtQt � RtQt�M +Wt;

where

� Ct: consumption in period t;

� �: discount factor;

� U(:): utility function;

� Qt: quantity of asset held at the end of period t;

� Pt: price of asset at t;

� Rt: date t payo¤ from holding a unit of an M -period asset purchased at date

t�M ;

� Wt: (real) labor income at date t:

Maximizing (4) leads to

PtU0(Ct) = �MEt

�Rt+MU

0(Ct+M )�for all t, (5)

where Et [:] is the conditional expectation. Equivalently,

Et

��M

Rt+MPt

U 0(Ct+M )

U 0(Ct)� 1�= 0: (6)

4

Page 6: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Suppose we can observe Rt+M , Pt, Ct and is willing to accept that the utility function

is given by

U(Ct) = C t = :

Then, (6) can be written as

Et

"�M

Rt+MPt

C �1t+M

C �1t

� 1#= 0:

For any information variable observable to the agent at time t, say zt,

E

" �M

Rt+MPt

C �1t+M

C �1t

� 1!zt

#= 0:

2. De�nition

We now de�ne GMM in a general framework. Consider the following moment restriction

E [m(Xt; �0)] = 0;

where Xt is a random vector. In general, m(Xt; �) is a vector-valued function of Xt. If

the dimension of m(Xt; �0) is k, we say there are k moment restrictions.

Suppose we have q parameters to estimate. Then the GMM estimator can be con-

structed as follows. First, evaluate the function m(Xt; �) at the observations:

m(Xt; �); t = 1; :::; T:

Next, compute the sample average

mT (�) = T�1TXt=1

m(Xt; �)

Finally, solve

mT (�) = 0: (7)

The idea is very simple: the GMM estimator is obtained by matching the sample and

population moments. However, if k > q, then (7) in general has no solution. The idea

is then to take a weighted average of the k equations and make it as close to zero as

possible, leading to the following general de�nition for the GMM estimator.

5

Page 7: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

De�nition 1. (GMM estimator) Let WT be an k by k symmetric positive de�nite

matrix such that WT !p W0 as the sample size T approaches in�nity, where W0 is

non-random and positive de�nite. The GMM estimator of �; denoted as �(WT ), is given

by

�(WT ) = argmin�mT (�)

0WTmT (�) (8)

Clearly, the estimator is a function of WT . We return to the choice of WT later. For

now, assume WT is already speci�ed.

3. Global and local identi�cation

We consider conditions that can ensure E [m(Xt; �0)] = 0 has a unique solution.

3.1. Global identi�cation

De�nition 2. The parameter vector � is globally identi�ed at �0 based on the moment

function m(:) if

E [m(Xt; �)] = 0 if and only if � = �0:

A necessary condition for global identi�cation is the "order condition".

Order Condition: If k � q, then we say the order condition for identi�cation is

satis�ed.

1. if k = q, the model is just identi�ed;

2. if k > q the model is over identi�ed.

The order condition is necessary but not su¢ cient for identi�cation. Necessary and

su¢ cient conditions are hard to �nd for nonlinear models. In practice, a weaker concept,

local identi�cation, is often considered.

6

Page 8: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

3.2. Local identi�cation

De�nition 3. The parameter vector � is locally identi�ed at �0 by the moment function

m(:) if there exists a neighborhood of �0, B(�0), such that inside this neighborhood

E [m(Xt; �)] = 0 if and only if � = �0:

Rank Condition: We say the rank condition for local identi�cation is satis�ed if the

k � q matrix of derivatives@E [m(Xt; �)]

@�0(9)

is continuous and has full column rank q at �0.

Lemma 1. Suppose the rank of @E [m(Xt; �)] =@�0 is constant in a neighborhood of �0.

Then, the parameter vector � is locally identi�ed at �0 by the moment function m(:) if

and only the rank condition is satis�ed.

Remark 1. If the constant rank requirement in (9) is dropped, then the condition is suf-

�cient but not necessary. (That is, there are situations where the rank of @E [m(Xt; �)] =@�0

is less than q, but � is still identi�ed.)

Remark 2. In the context of a linear model, for example, see the 2SLS in (1), we have

@E [m(Xt; �0)]

@�0=@E [(yt � x0t�0)zt]

@�0= Eztx

0t:

Hence the rank condition is equivalent to the requirement that Eztx0t has rank q.

Proof of the Lemma. We use the arguments in Theorem 1 in Rothenberg (1971) to

prove the result.

Suppose �0 is not locally identi�ed. Then, there exists an in�nite sequence of vectors

f�sg1s=1 approaching �0 such that, for each s,

E [m(Xt; �0)] = E [m(Xt; �s)] .

By the mean value theorem and the di¤erentiability of E [m(Xt; �)] in �;

0 = E [mj(Xt; �0)]� E [mj(Xt; �s)] =@Ehmj(Xt; ~�(j)

i@�0

(�s � �0);

7

Page 9: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

where the subscript j denotes the j-th element of the vector and ~�(j) lies between �s

and �0 and in general depends on j. Let

ds =�s � �0k�s � �0k

;

then@Ehmj(Xt; ~�(j)

i@�0

ds = 0 for every s.

The sequence fdsg is an in�nite sequence on the unit sphere and therefore there existsa limit point d (note that d does not depend on j). As �s ! �0; ds approaches d and

by the continuity of @E [m(Xt; �)] =@�0 we have

lims!1

@Ehmj(Xt; ~�(j)

i@�0

ds =@E [mj(Xt; �0)]

@�0d = 0:

Because this holds for an arbitrary j, it holds for the full vector:

@E [m(Xt; �0)]

@�0d = 0;

which implies

rank

�@E [m(Xt; �0)]

@�0

�< q:

To show the converse, suppose that @E [m(Xt; �0)] =@�0 has constant rank � < q in a

neighborhood of �0 denoted by B(�0). Consider the characteristic vector c(�) associated

with one of its zero roots. We have

@E [m(Xt; �)]

@�0c(�) = 0 (10)

for all � 2 �(�0). Because the gradient is continuous and has constant rank in B(�0),

the vector c(�) is continuous in B(�0). Consider the curve � de�ned by the function

�(v) which solves for 0� v � �v the di¤erential equation

@�(v)

@v= c(�);

�(0) = �0:

8

Page 10: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Then,

@E [m(Xt; �(v))]

@v=@E [m(Xt; �(v))]

@�(v)0@�(v)

@v=@E [m(Xt; �(v))]

@�(v)0c(�) = 0

for all 0 � v � �v, where the last equality uses (10). Thus, E [m(Xt; �)] is constant on

the curve �. This implies that �0 is unidenti�able. This completes the proof.

3.3. Identi�ed, but only weakly

Local and global identi�cation are properties of the population. If � is unidenti�ed,

then it is impossible to pin down its value even with an in�nite sample size.

In practice, the situation can be worse because we only observe a �nite sample size.

Even if the parameters are globally identi�ed, the GMM criterion functionmT (�)0WTmT (�)

can still be �at or nearly �at around �0 for such a sample size. This poses substantially

challenge for inference, and has led to a sizeable literature that is often referred to as

"inference under weak identi�cation".

In the remainder of this chapter, we assume the parameters are strongly identi�ed.

We return to weak identi�cation later.

4. Estimation and inference in well identi�ed models

If the parameter is strongly identi�ed , then under some additional regularity conditions

the GMM estimator is consistent and converges at ratepT to a Normal distribution.

4.1. The asymptotic distribution

Proposition 1. Suppose � is globally identi�ed at �0, Also, assume the following con-

ditions are satis�ed

1. (LLN)@mT (�)

@�0!p G(�) =

@E [m(Xt; �)]

@�0;

where the convergence holds uniformly in a compact neighborhood of �0. Assume

G(�) is continuous and write G0 � G(�0).

9

Page 11: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

2. (CLT)pTmT (�0)!d N(0; S0)

where S0 is non-random and positive de�nite.

Then,pT (�(WT )� �0)!d N(0; V (W0))

with

V (W0) =�G00W0G0

��1(G00W0S0W0G0)

�G00W0G0

��1Proof: Ruud (2001), pp.546-547.

Remark 3. The LLN and CLT require @mT (�)=@�0 andmT (�) to be free of trends (time

trend, or stochastic trends). This requirement is non-trivial. For example, variables such

as GDP or price indices tend to grow over time. In practice, two methods are often used

to eliminate such trends. The �rst is to run the data through some �lter. The second

is to normalize the variables such that their trends will cancel out. We will provide

illustrations in the next chapter.

Remark 4. The limiting variance depends on the matrix W0: This implies that more

e¢ cient estimators can be obtained by appropriate choices of WT .

4.2. E¢ cient GMM

The limiting variance of GMM is minimized if WT is chosen such that (prove it!)

W0 = plimT!1

WT = S�10

Recall that S0 is the limiting variance ofpTmT (�0), hence e¢ ciency is achieved by

assigning more weights to moments that have smaller variances.

In this case, the limiting distribution is given by

pT (� � �0)!d N(0; V0) with V0 =

�G00S

�10 G0

�1:

Remark 5. In subsequent discussions, we let � denote the e¢ cient GMM estimator

unless stated otherwise.

10

Page 12: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

4.3. Two-step and continuous updating GMM

The weighting matrix S0 depends on the unknown parameter �0. Two ways to address

this issues lead to the following two asymptotically equivalent estimators.

1. Obtain some preliminary estimate of � using the identify matrix as weighting

matrix. Denote the estimate as �1: Compute ST (�1) and solve

�GMM = argmin�mT (�)

0ST (�1)�1mT (�):

This is often referred to as the "two-step GMM estimator". It is often the default

choice in practice.

2. Obtain the estimate in one step, i.e., treat ST (�) as a function of � and solve

�CUGMM = argmin�mT (�)

0ST (�)�1mT (�)

This is often referred to as the "Continuous Updating GMM estimator", or CUGMM.

This is used less often compared with the two step estimator.

5. Testing parametric restrictions in well identi�ed models

We consider testing restrictions of the form

R(�0) = 0; (11)

where R(:) is s-vector of di¤erentiable function with s < q.

It is useful to re-state the general model. It is speci�ed by the following moment

conditions

E [m(Xt; �0)] = 0

Given a sample of size T; the sample moments are given by.

mT (�) = T�1TXt=1

m(Xt; �):

Let

QT (�) = mT (�)0S�1T mT (�); (12)

11

Page 13: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

where ST is a consistent estimate of the limiting variance ofpTmT (�0). The e¢ cient

GMM estimator then solves

� = argmin�QT (�):

Let

� � : unrestricted GMM estimate

� ~� : restricted GMM estimate

We present the GMM counterparts to the Wald, score and the likelihood ration test.

5.1. Wald Test

The Wald statistic evaluates the restriction at the unrestricted estimate:

W =pTR(�)0

"@R(�)

@�0VT (�)

@R(�)0

@�

#�1pTR(�) (13)

Recall that the GMM estimator satis�espT (� � �0)!d N(0; V0):

Applying the Delta method,

pT (R(�)�R(�0))!d N

�0;@R(�0)

@�0V0@R(�0)

0

@�

�:

This implies

pT (R(�)�R(�0))0

�@R(�0)

@�0V0@R(�0)

0

@�

��1pT (R(�)�R(�0))!d �2s:

Because R(�0) = 0 under the null hypothesis, we have

pTR(�)0

�@R(�0)

@�0V0@R(�0)

0

@�

��1pTR(�)!d �2s:

Finally, � converges in probability to �0, we have

@R(�)

@�0VT (�)

@R(�)0

@�!p @R(�0)

@�0V0@R(�0)

0

@�:

Therefore, the Wald statistic converges to �2s under the null hypothesis.

12

Page 14: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Example 1. Consider the following model with endogeneity,

y = X� + "

We want to test the linear restrictions

R� � r = 0:

Suppose there are k instruments, summarized by matrix Z. Assume the errors are iid.

Then,

� =�X 0PZX

��1X 0PZy

Its limiting distribution is given by

pT (� � �0)!d N(0; V0)

with

V0 = �2 plimT!1

�T�1X 0PZX

��1:

The Wald test is

W = (R� � r)0hR�X 0PZX

��1R0i�1

(R� � r)=�2:

5.2. Gradient test

The Gradient test is the GMM counterpart to the score test. It only requires estimating

model under the null hypothesis. The test computes the derivative of the GMM criterion

function (12) at the restricted estimate (multiplied bypT ):

pT@QT (�)

@�= 2

pT@mT (�)

0

@�S�1T mT (�) (14)

= 2pTGT (�)

0ST (�)�1mT (�)

The idea is that if the restrictions are true, then the above quantity should be close

to zero. As in the score test, we form a quadratic form with an metric to judge the

signi�cance of the deviations from 0. The metric used here is the limiting variance of

(14) under the null hypothesis, which is given by

G00S�10 G0 = V �10

13

Page 15: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

The Gradient test is therefore

G = (pT@QT (�)

@�0)VT (�)(

pT@QT (�)

@�) (15)

where V (�) is a consistent estimate of V0 under the null hypothesis.

The null limiting distribution is �2s where s is the number of restrictions.

5.3. Distance test

Recall that the LR test examines the di¤erence in the likelihood functions with and

without imposing the restrictions. The same idea can be used in the GMM framework,

leading to the following test:

D = ThQT (~�)�QT (�)

i:

The null limiting distribution is �2s where s is the number of restrictions.

6. Model diagnostics

We are interested in testing the speci�cation of the model thorough testing the validity

of the moment restrictions.

6.1. Formulating model diagnostics as testing for parametric restrictions

Example 2. Consider the following model with endogeneity,

y = X� + ":

Suppose we have k potential instruments, summarized by matrix Z. We conjecture that

some of the instruments may be correlated with the errors, hence not valid. Suppose

we have partitioned Z into Z1 and Z2 with Z2 containing the questionable instruments.

Then the problem reduces to testing the restriction

E�(yt � x0t�0)z2;t

�= 0:

14

Page 16: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Consider a more general set up than the above example. Suppose we have partitioned

the moments into two subsets, i.e.,

m(Xt; �0)k�1 =

"m1(Xt; �0)k1�1

m2(Xt; �0)(k�k1)�1

#;

where we believe

E(m1(Xt; �0)) = 0

but think the second set of conditions

E(m2(Xt; �0)) = 0

are questionable. In other words, we want to test

H0 : E(m2(Xt; �0)) = 0 (16)

against the hypothesis

H1 : E(m2(Xt; �0)) 6= 0

The problem (16) can reformulated as testing for parametric restrictions to which

the trinity of test procedures applies.

Re-write the hypothesis of interest as

E(m2(Xt; �0)� 0) = 0

where 0 = 0 under the null hypothesis and nonzero under the alternative hypothesis.

This leads to the following augmented moment functions

ma(Xt; �; ) =

"m1(Xt; �)

m2(Xt; �)�

#

and the augmented moment restrictions:

Ema(Xt; �0; 0) = 0 (17)

By this simple transformation, the problem reduces to testing the parametric restrictions

speci�ed by = 0 based on moment conditions (17).

15

Page 17: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

The augmented sample moments are given by

maT (�; ) = T�1

TXt=1

ma(Xt; �; )

The unrestricted estimates (�; ) are given by

(�; ) = argmin�;

maT (�; )

0S�1T maT (�; )

The restricted estimates are (~�; 0)

~� = argmin�maT (�; 0)

0S�1T maT (�; 0)

Note that the same weighting matrix is used for the restricted and unrestricted esti-

mates.

Wald test. The Wald test can be constructed using the formula (13) but with �

replaced by (�; ). The restrictions are linear and given by

R(�; ) =h0(k�k1)�q Ik�k1

i " �

#= 0:

The details are omitted.

The Gradient test. The LM test can be constructed using the formula (15) but

with � replaced by (~�; 0).

The Distance test is

D = ThmaT (~�; 0)0S�1T ma

T (~�; 0)�ma

T (�; )0S�1T ma

T (�; )i;

or equivalently,

D =ThmaT (~�; 0)0S�1T ma

T (~�; 0)�ma

1;T (�)0S�111;Tm

a1;T (�)

i;

where S11;T consists of entries in ST corresponding to m2(Xt; �):

16

Page 18: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

6.2. Testing overidentifying restrictions

The speci�cation tests discussed above require separating the moment conditions into

two subsets. This may have undesirable consequences. In particular, if the FIRST

subset in fact contains false moment restrictions, we may end up rejecting moment

conditions in the SECOND subset even if they are valid. This suggests testing for

moment restrictions without having to dividing them up.

The idea is then to look at the magnitude of the GMM criterion function when all

moments are used, and a large value indicates some moment conditions may be invalid.

The resulting procedure is usually referred to as testing for "overidentifying restrictions"

because testing is possible only if k > q. Because if k = q, then the objective function

equals zero when evaluated at �, hence gives no information about the validity of the

moment conditions.

The test statistic, J; and its limiting distribution are given by

J = T �mT (�)0S�1T mT (�)!d �2k�q

The limiting distribution has k � q degrees of freedom intuitively because q moment

conditions are used to estimated the q parameters (hence they are not "free"). Of

course, such a test leaves open which moments are invalid should the test statistic

appear statistically signi�cant.

6.3. Hausman�s Speci�cation Test

Hausman (1978) proposed a general testing methodology, which in particular can be

employed to test for the validity of moment restrictions. The methodology di¤ers from

the afore mentioned ones, because it is based on the sampling behavior of di¤erent

estimators of parameters rather than population moments or parameters.

In general, Hausman test works as follows. To test the null hypothesis, we need two

estimators:

� ~� : E¢ cient under the null hypothesis but inconsistent under the alternative hy-pothesis;

17

Page 19: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

� � : Consistent under both the null and the alternative hypothesis, but less e¢ cientunder the null hypothesis.

Then,

H = (~� � �)0hV (� � ~�)

i�(~� � �):

It has a chi-square limiting distribution with q degrees of freedom, where q is the di-

mension of �:

We illustrate the test with the following example.

Example 3. (Testing for endogeneity) Consider the model

y = X� + "

where the errors are iid. We suspect some variables are correlated with the errors.

Suppose a set of valid instruments is available. We have:

� If there is no endogeneity, then the OLS estimator is consistent and BLUE; and2SLS is also consistent, but less e¢ cient.

� If there is endogeneity, then the OLS estimator is not consistent; and the 2SLS isconsistent.

Let �OLS be the OLS estimator and �2SLS the 2SLS estimator. Then the Hausman

test is de�ned as

H = T (�OLS � �2SLS)0V (pT��OLS � �2SLS

�)�(�OLS � �2SLS);

where V�p

T��OLS � �2SLS

��is an estimate for the limiting variance of

pT��OLS � �2SLS

�under the null hypothesis, and (:)� denotes generalized inverse because the variance ma-

trix may be singular. Because �OLS is BLUE under the null hypothesis, we have

V (pT��OLS � �2SLS

�) = V (

pT��2SLS � �0

�)� V (

pT��OLS � �0

�):

where V (pT��2SLS � �0

�) and V (

pT��OLS � �0

�) are the limiting variance of

pT (�2SLS�

�0) andpT (�OLS � �0).

18

Page 20: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

Suppose the null hypothesis is true, i.e., there is no endogeneity, then the test has a

chi-square limiting distribution by construction. If the null hypothesis is false, then �OLSis biased and inconsistent while �2SLS is still consistent, then the di¤erence �OLS��2SLSwill tend to be large. This forces the test to take on a large value.

19

Page 21: Chapter 1. GMM: Basic Concepts - BU Personal Websitespeople.bu.edu/qu/EC709-2012/chapter01.pdf · Suppose we can observe R t+M, P t, C tand is willing to accept that the utility function

References

[1] Clarida, R., Galí, J. and Gertler, M. (2000): "Monetary Policy Rules And Macroeco-

nomic Stability: Evidence And Some Theory," The Quarterly Journal of Economics,

115, 147-180.

[2] Cochrane, J.H. (2011): "Determinacy and Identi�cation with Taylor Rules", Journal

of Political Economy, 119, 565-615.

[3] Rothenberg, T. J. (1971): "Identi�cation in Parametric Models," Econometrica, 39,

577-591.

[4] Ruud, P.A. (2000): An Introduction to Classical Econometric Theory. Oxford Uni-

versity Press.

20