07_toeplitz

Embed Size (px)

Citation preview

  • 7/26/2019 07_toeplitz

    1/8

    Statistics 910, #7 1

    Eigenvectors and Eigenvalues of Stationary Processes

    Overview

    1. Toeplitz matrices

    2. Szegos theorem

    3. Circulant matrices

    4. Matrix Norms

    5. Approximation via circulants

    Toeplitz and circulant matrices

    Toeplitz matrix A banded, square matrix n (subscript n for the n n

    matrix) with elements [n]jk =jk,

    n =

    0 1 2 1n1 0 1 2n

    2 1 0. . .

    ......

    . . . . . . 1

    n1 n2 1 0

    (1)

    Symmetry Toeplitz matrices dont have to be symmetric or real-valued,

    but ours will be since well set h = h = Cov(Xt+h, Xt) for some

    stationary process Xt. From now on, n is the covariance matrix of a

    stationary process.

    Assumptions We will assume that the covariances are absolutely summable,h

    |h|<

    Breadth The results shown here are in the form thats most accessible,

    without searching for too much generality. All of these extend more

    generally to other types of sequences, such as those that are square

    summable, and other matrices that need not be symmetric.

  • 7/26/2019 07_toeplitz

    2/8

    Statistics 910, #7 2

    Szegos Theorem

    Fourier transform Since the covariances are absolutely summable, it is

    (relatively) easy to show that we can define a continuous function

    from the h,

    f() =

    h=

    hei2h , 12 <

    12 . (2)

    Ifhare the covariances of a stationary process, then f() is known as

    the spectral density functionor spectrum of the process. (See Section

    4.1-4.3 in SS.)

    Inverse transform The Fourier transform is invertible in the sense that

    we can recover the sequence of covariances from the spectral density

    functionf() by integration

    h =

    1/21/2

    f()ei2hd . (3)

    Heuristically, the expression for the variance 0 =1/21/2 f()d sug-

    gests that the spectral density decomposes the variance of the process

    into a continuous mix of frequencies.

    Szegos theorem Define the eigenvalues of n as

    n,0, n,1, . . . , n,n1.

    These are all positive if n is positive definite (as we often require

    or assume). Szegos theorem shows that we can use the spectum to

    approximate various various functions of the eigenvalues. (An inter-

    esting question in the analysis of Toeplitz matrices in general is what

    happens when n is not full rank.)

    LetGdenote an arbitrary continuous function. Then

    limn

    1n

    n1h=0

    G(n,h) = 1/2

    1/2G( f() )d (4)

    That is, we get to replace a sum by an integral. The sum resembles a

    Riemann approximation to an integral, as ifn,h = f(2h/n).

  • 7/26/2019 07_toeplitz

    3/8

    Statistics 910, #7 3

    Basic examples

    Trace The trace of a square matrix is the sum of the diagonal values

    of the matrix, which equals the sum of the eigenvalues of the

    matrix.

    1

    ntrace(n) =

    1

    n

    h

    n,h

    1/21/2

    f()d

    Though it follows directly from (3) that 0 =1/21/2 f()d, it is

    also a consequence of (4) as well.

    Determinant The product of the eigenvalues of a matrix.

    log |n|1/n = 1

    n

    log n,h

    1/2

    1/2log f()d .

    Prediction application Heres a surprising application of Szegos theorem

    to prove a theorem of Kolmogorov. IfXt is a staionary process, how

    well is it possible to predict Xn+1 linearly from its n predecessors? In

    particular, what is

    Vn = mina

    E (Xn+1 a0Xn a2Xn1 an1X1)2

    In the Gaussian case, the problem is equalvalent to finding the variance

    of the conditional expectation ofXn+1 given X1:n. Szegos theorem

    provides an elegant answer:

    limn

    Vn = exp

    1/21/2

    log f()d

    . (5)

    The answer follows from applying (4) to the ratio of determinants,

    once you note that

    Vn = |n+1|

    |n| . (6)

    (Long aside: To see that (6) holds, covariance manipulations of the

    multivariate normal distribution show you that the variance of the

    scalar r.v. Ygiven the vector r.v. X is Var(Y|X) =yy yx1xx xy,where the variance matrix is partitioned as

    = Var

    Y

    X

    =

    yy yxxy xx

    .

  • 7/26/2019 07_toeplitz

    4/8

    Statistics 910, #7 4

    To find the determinant ||, postmultiply by a matrix with 1s on

    the diagonal, obtaining

    ||=

    1 0

    1xx xy I

    = (yy yx1xx xy)|xx|The expression (6) follows. Note that the regression vector =

    1xx xy).

    Now back to the prediction problem. From Szegos theorem, 1n+1

    log n+1,j 1n

    log n,j since both approximate the same integral. Now plug in

    |n+1|1/(n+1) |n|

    1/n and (5) follows.

    Circulant matrices

    Circulant matrix is a special type of Toeplitz matrix constructed by ro-

    tating a vector c0, c1, . . . , cn1, say, cyclically by one position to fill

    successive rows of a matrix,

    Cn =

    c0 c1 c2 cn1cn1 c0 c1 cn2cn2 cn1 c0 c1

    ... . . .

    . . . c1c1 c2 cn1 c0

    (7)

    Circulant matrices are an important type of Toeplitz matrix for our

    purpose (they have others!) because we can easily find their eigenvec-

    tors and eigenvalues.

    Eigenvectors These are relatively easy to find from the basic definition of

    an eigenvector and a great guess for the answer. An eigenvectoru of

    Cn satisfies Cn u= u, which gives a system ofn equations:

    c0u0+ c1u1+ + cn1un1 = u0

    cn1u0+ c0u1+ + cn2un1 = u1...

    ...

    c1u0+ c2u1+ + c0 un1 = un1

  • 7/26/2019 07_toeplitz

    5/8

    Statistics 910, #7 5

    The equation in the rth row is, in modular form,

    rowr:

    nr1j=0

    cjuj+r+

    n1j=nr

    cju(j+r)|n = ur

    or in more conventional form as

    rowr :nr1j=0

    cjuj+r+n1

    j=nr

    cjuj+rn = ur .

    Guessuj =j (maybe reasonable to guess this if you have been study-

    ing differential equations.). Then

    nr1j=0

    cjj+r +

    n1j=nr

    cjj+rn = r .

    Ifn = 1, it works! These are the n roots of unity; any = exp(i2j/n)

    works for j = 0, 1, . . . , n 1. The eigenvectors have the form u =

    (0, 1, 2, . . . , n1):

    ur = (1, ei2r/n, ei22r/n, . . . , ei2(n1)r/n)

    as seen in the discrete Fourier transform.

    Eigenvalues These are the discrete Fourier transforms of the sequence that

    defines the circulant,

    n,k =

    n1j=0

    cjei2jk/n

    If thecj =j , then weve got the first n terms of the sum that defines

    the spectral density (2).

    Implications for covariance matrices We can now anticipate the re-

    sults. Asymptotically in n, the vectors that define the discrete Fourier

    transform are eigenvectors ofeverycovariance matrix. The eigenvalues

    are then the transform coefficients of the covariances, namely valuesof the spectral density function. If we define an orthogonal matrix

    U= (u0, u1, . . . , un1) from the eigenvectors un, then we obtain

    U n Udiag(f(j))

  • 7/26/2019 07_toeplitz

    6/8

    Statistics 910, #7 6

    To confirm these guesses, we need to show that a circulant matrix

    provides a good approximation to the covariance, good enough sothat we can use the results for circulants when describing covariance

    matrices. To describe what we mean by a good approximation, we

    need a way to measure the distance between matrices, a norm.

    Matrix norms

    Norms A norm on a vector space Vis any functionxfor x Vfor which

    ( R)

    1. x> 0, x = 0.

    2. x= ||x

    3. x + y x + y(triangle inequality)

    Operator norm Define the operator norm of a square matrix T as the

    maximum ratio of quadratic forms,

    T= maxx

    xT x

    xx = max j

    wherej are the eigenvalues (singular values) ofT.

    Hilbert-Schmidt or weak norm. For an n n matrix T,

    |T|2 = 1n

    j,k

    |tjk |2

    = 1ntrace(TT)

    = 1n

    2j .

    Connections It is indeed a weaker norm,

    |T|2 T2 =2max

    The two allow you to handle products,

    |T S| T|S|.

  • 7/26/2019 07_toeplitz

    7/8

    Statistics 910, #7 7

    Approximation via circulants

    An approximation Consider the following circulant that approximates

    n, obtained by flipping the covariances and running them both

    ways:

    Gn=

    0 1+ n1 2+ n2 n1+ 11+ n1 0 1+ n1 n2+ 2

    2+ n2 1+ n1 0. . .

    ......

    . . . . . . 1

    n1+ 1 n2+ 2 1+ n1 0

    (8)

    The norm of the difference n Gn is a familiar type of sum,

    |n Gn|2 = 2n

    n1h=1

    (n h)2nh

    = 2n

    n1h=1

    h 2h reverse the sum

    = 2n1h=1

    hn

    2h .

    Since we assume that |j|< , the sum converges to 0,limn

    |n Gn|= 0.

    Eigenvalues of close matrices If two matrices are close in the weak norm,

    then the averages of their eigenvalues are close. In particular, given

    two matrices Aand B with eigenvalues j and j, then

    | 1n

    j

    j 1n

    j

    j| |A B|

    Proof: IfD = A B, then

    j

    j j

    j = trace(A) trace(B) = trace(D).

    Now use Cauchy-Schwarz in the form

    (j

    aj)2 = (

    j

    aj 1)2 (

    j

    a2j )(n) (9)

  • 7/26/2019 07_toeplitz

    8/8

    Statistics 910, #7 8

    to show that

    |trace(D)|2 =|j

    djj |2 n

    j

    d2jj nj,k

    d2jk =n2|D|2 .

    The loose bounds in the proof suggest you can do a lot better. In fact,

    you can move the absolute value inside the sum (so that the actual

    eigenvalues are getting close, not just on average).

    Powers of eigenvalues A messier argument produces a similar result. Given

    two matrices A and B with |A B| 0 and eigenvalues j and j,

    then for any power s,

    lim 1nj

    (sj sj ) = 0 .

    Extension Now that weve found that powers of the eigenvalues converge

    for close matrices in the limit, its not hard to get the result for a

    continuous function. The argument is a common one: polynomials

    can approximate any continuous function g. The result implies that

    lim 1n

    j

    g(j) g(j) = 0.