Upload
romualdo-begale-prudencio
View
221
Download
0
Embed Size (px)
Citation preview
7/26/2019 07_toeplitz
1/8
Statistics 910, #7 1
Eigenvectors and Eigenvalues of Stationary Processes
Overview
1. Toeplitz matrices
2. Szegos theorem
3. Circulant matrices
4. Matrix Norms
5. Approximation via circulants
Toeplitz and circulant matrices
Toeplitz matrix A banded, square matrix n (subscript n for the n n
matrix) with elements [n]jk =jk,
n =
0 1 2 1n1 0 1 2n
2 1 0. . .
......
. . . . . . 1
n1 n2 1 0
(1)
Symmetry Toeplitz matrices dont have to be symmetric or real-valued,
but ours will be since well set h = h = Cov(Xt+h, Xt) for some
stationary process Xt. From now on, n is the covariance matrix of a
stationary process.
Assumptions We will assume that the covariances are absolutely summable,h
|h|<
Breadth The results shown here are in the form thats most accessible,
without searching for too much generality. All of these extend more
generally to other types of sequences, such as those that are square
summable, and other matrices that need not be symmetric.
7/26/2019 07_toeplitz
2/8
Statistics 910, #7 2
Szegos Theorem
Fourier transform Since the covariances are absolutely summable, it is
(relatively) easy to show that we can define a continuous function
from the h,
f() =
h=
hei2h , 12 <
12 . (2)
Ifhare the covariances of a stationary process, then f() is known as
the spectral density functionor spectrum of the process. (See Section
4.1-4.3 in SS.)
Inverse transform The Fourier transform is invertible in the sense that
we can recover the sequence of covariances from the spectral density
functionf() by integration
h =
1/21/2
f()ei2hd . (3)
Heuristically, the expression for the variance 0 =1/21/2 f()d sug-
gests that the spectral density decomposes the variance of the process
into a continuous mix of frequencies.
Szegos theorem Define the eigenvalues of n as
n,0, n,1, . . . , n,n1.
These are all positive if n is positive definite (as we often require
or assume). Szegos theorem shows that we can use the spectum to
approximate various various functions of the eigenvalues. (An inter-
esting question in the analysis of Toeplitz matrices in general is what
happens when n is not full rank.)
LetGdenote an arbitrary continuous function. Then
limn
1n
n1h=0
G(n,h) = 1/2
1/2G( f() )d (4)
That is, we get to replace a sum by an integral. The sum resembles a
Riemann approximation to an integral, as ifn,h = f(2h/n).
7/26/2019 07_toeplitz
3/8
Statistics 910, #7 3
Basic examples
Trace The trace of a square matrix is the sum of the diagonal values
of the matrix, which equals the sum of the eigenvalues of the
matrix.
1
ntrace(n) =
1
n
h
n,h
1/21/2
f()d
Though it follows directly from (3) that 0 =1/21/2 f()d, it is
also a consequence of (4) as well.
Determinant The product of the eigenvalues of a matrix.
log |n|1/n = 1
n
log n,h
1/2
1/2log f()d .
Prediction application Heres a surprising application of Szegos theorem
to prove a theorem of Kolmogorov. IfXt is a staionary process, how
well is it possible to predict Xn+1 linearly from its n predecessors? In
particular, what is
Vn = mina
E (Xn+1 a0Xn a2Xn1 an1X1)2
In the Gaussian case, the problem is equalvalent to finding the variance
of the conditional expectation ofXn+1 given X1:n. Szegos theorem
provides an elegant answer:
limn
Vn = exp
1/21/2
log f()d
. (5)
The answer follows from applying (4) to the ratio of determinants,
once you note that
Vn = |n+1|
|n| . (6)
(Long aside: To see that (6) holds, covariance manipulations of the
multivariate normal distribution show you that the variance of the
scalar r.v. Ygiven the vector r.v. X is Var(Y|X) =yy yx1xx xy,where the variance matrix is partitioned as
= Var
Y
X
=
yy yxxy xx
.
7/26/2019 07_toeplitz
4/8
Statistics 910, #7 4
To find the determinant ||, postmultiply by a matrix with 1s on
the diagonal, obtaining
||=
1 0
1xx xy I
= (yy yx1xx xy)|xx|The expression (6) follows. Note that the regression vector =
1xx xy).
Now back to the prediction problem. From Szegos theorem, 1n+1
log n+1,j 1n
log n,j since both approximate the same integral. Now plug in
|n+1|1/(n+1) |n|
1/n and (5) follows.
Circulant matrices
Circulant matrix is a special type of Toeplitz matrix constructed by ro-
tating a vector c0, c1, . . . , cn1, say, cyclically by one position to fill
successive rows of a matrix,
Cn =
c0 c1 c2 cn1cn1 c0 c1 cn2cn2 cn1 c0 c1
... . . .
. . . c1c1 c2 cn1 c0
(7)
Circulant matrices are an important type of Toeplitz matrix for our
purpose (they have others!) because we can easily find their eigenvec-
tors and eigenvalues.
Eigenvectors These are relatively easy to find from the basic definition of
an eigenvector and a great guess for the answer. An eigenvectoru of
Cn satisfies Cn u= u, which gives a system ofn equations:
c0u0+ c1u1+ + cn1un1 = u0
cn1u0+ c0u1+ + cn2un1 = u1...
...
c1u0+ c2u1+ + c0 un1 = un1
7/26/2019 07_toeplitz
5/8
Statistics 910, #7 5
The equation in the rth row is, in modular form,
rowr:
nr1j=0
cjuj+r+
n1j=nr
cju(j+r)|n = ur
or in more conventional form as
rowr :nr1j=0
cjuj+r+n1
j=nr
cjuj+rn = ur .
Guessuj =j (maybe reasonable to guess this if you have been study-
ing differential equations.). Then
nr1j=0
cjj+r +
n1j=nr
cjj+rn = r .
Ifn = 1, it works! These are the n roots of unity; any = exp(i2j/n)
works for j = 0, 1, . . . , n 1. The eigenvectors have the form u =
(0, 1, 2, . . . , n1):
ur = (1, ei2r/n, ei22r/n, . . . , ei2(n1)r/n)
as seen in the discrete Fourier transform.
Eigenvalues These are the discrete Fourier transforms of the sequence that
defines the circulant,
n,k =
n1j=0
cjei2jk/n
If thecj =j , then weve got the first n terms of the sum that defines
the spectral density (2).
Implications for covariance matrices We can now anticipate the re-
sults. Asymptotically in n, the vectors that define the discrete Fourier
transform are eigenvectors ofeverycovariance matrix. The eigenvalues
are then the transform coefficients of the covariances, namely valuesof the spectral density function. If we define an orthogonal matrix
U= (u0, u1, . . . , un1) from the eigenvectors un, then we obtain
U n Udiag(f(j))
7/26/2019 07_toeplitz
6/8
Statistics 910, #7 6
To confirm these guesses, we need to show that a circulant matrix
provides a good approximation to the covariance, good enough sothat we can use the results for circulants when describing covariance
matrices. To describe what we mean by a good approximation, we
need a way to measure the distance between matrices, a norm.
Matrix norms
Norms A norm on a vector space Vis any functionxfor x Vfor which
( R)
1. x> 0, x = 0.
2. x= ||x
3. x + y x + y(triangle inequality)
Operator norm Define the operator norm of a square matrix T as the
maximum ratio of quadratic forms,
T= maxx
xT x
xx = max j
wherej are the eigenvalues (singular values) ofT.
Hilbert-Schmidt or weak norm. For an n n matrix T,
|T|2 = 1n
j,k
|tjk |2
= 1ntrace(TT)
= 1n
2j .
Connections It is indeed a weaker norm,
|T|2 T2 =2max
The two allow you to handle products,
|T S| T|S|.
7/26/2019 07_toeplitz
7/8
Statistics 910, #7 7
Approximation via circulants
An approximation Consider the following circulant that approximates
n, obtained by flipping the covariances and running them both
ways:
Gn=
0 1+ n1 2+ n2 n1+ 11+ n1 0 1+ n1 n2+ 2
2+ n2 1+ n1 0. . .
......
. . . . . . 1
n1+ 1 n2+ 2 1+ n1 0
(8)
The norm of the difference n Gn is a familiar type of sum,
|n Gn|2 = 2n
n1h=1
(n h)2nh
= 2n
n1h=1
h 2h reverse the sum
= 2n1h=1
hn
2h .
Since we assume that |j|< , the sum converges to 0,limn
|n Gn|= 0.
Eigenvalues of close matrices If two matrices are close in the weak norm,
then the averages of their eigenvalues are close. In particular, given
two matrices Aand B with eigenvalues j and j, then
| 1n
j
j 1n
j
j| |A B|
Proof: IfD = A B, then
j
j j
j = trace(A) trace(B) = trace(D).
Now use Cauchy-Schwarz in the form
(j
aj)2 = (
j
aj 1)2 (
j
a2j )(n) (9)
7/26/2019 07_toeplitz
8/8
Statistics 910, #7 8
to show that
|trace(D)|2 =|j
djj |2 n
j
d2jj nj,k
d2jk =n2|D|2 .
The loose bounds in the proof suggest you can do a lot better. In fact,
you can move the absolute value inside the sum (so that the actual
eigenvalues are getting close, not just on average).
Powers of eigenvalues A messier argument produces a similar result. Given
two matrices A and B with |A B| 0 and eigenvalues j and j,
then for any power s,
lim 1nj
(sj sj ) = 0 .
Extension Now that weve found that powers of the eigenvalues converge
for close matrices in the limit, its not hard to get the result for a
continuous function. The argument is a common one: polynomials
can approximate any continuous function g. The result implies that
lim 1n
j
g(j) g(j) = 0.