G. S. Maddala, C. R. Rao and H. D. Vinod, eds., Handbook of Statistics,Vo/. 11
@ 1993 Elsevier Science Publishers B.Y. All rights reserved.
;
10
Structural Time Series Models
.
Andrew C. Harvey and Neil Shephard
1. Introduction
.
A structural time series model is one which is set up in terms of components
which have a direct interpretation. Thus, for example, we may consider the
classical decomposition in which a series is seen as the sum of trend, seasonal
and irregular components. A model could be formulated as a regression with
explanatory variables consisting of a time trend and a set of seasonaldummies.
Typically, this would be inadequate. The necessaryflexibility may be achieved
by letting the regressioncoefficients changeover time. A similar treatment may
be accorded to other components such as cycles. The principal univariate
structural time seriesmodels are therefore nothing more than regressionmodels
in which the explanatory variables are functions of time and the parameters are
time-varying. Given this interpretation, the addition of observable explanatory
variables is a natural extension as is the construction of multivariate models.
Furthermore, the use of a regression framework opens the way to a unified
model selection methodology for econometric and time series models.
The key to handling structural time seriesmodels is the statespaceform with
the state of the system representing the various unobserved components such
as trends and seasonals.The estimate of the unobservablestate can be updated
by means of a filtering procedure as new observations become available.
Predictions are made by extrapolating these estimated components into the
future, while smoothing algorithms give the best estimate of the state at any
point within the sample. A structural model can therefore not only provide
forecasts, but can also, through estimates of the components, present a set of
stylised facts; see the discussionin Harvey and Jaeger (1991).
A thorough discussionof the methodological and technical ideas underlying
structural time series models is contained in the monographs by Harvey (1989)
and West and Harrison (1989), the latter adopting a Bayesian perspective.
Since then there have been a number of technical developments and applications to new situations. One of the purposesof the present article is to describe
these new results.
261
262
A. C. Harvey and N. Shephard
1.1. Statisticalformulation
A structural time series model for quarterly observations might consist of
trend, cycle, seasonal and irregular components. Thus
Yt
= ILl+ o/t+ + Et,
'Yt
t = 1,... , T ,
(1.1)
where ILt is the trend, if1tis the cycle, 1'tis the seasonaland St is the irregular.
All four components are stochastic and the disturbances driving them are
mutually uncorrelated. The trend, seasonal and cycle are all derived from
deterministic functions of time, and reduce to these functions as limiting cases.
The irregular is white noise.
The deterministic linear trend is
(1.2)
(1.3)
= a, continuity may be preserved by introducing stochastic terms as
follows:
with
J.Lo
(1.4a)
(lAb)
where TIt and ~t are mutually uncorrelated white noise disturbances with zero
means and variances, u~ and u~ respectively. The effect of TItis to allow the
level of the trend to shift up and down, while~t allows the slope to change. The
larger the variances, the greater the stochastic movements in the trend. If
u~ = u~ = 0, (1.4) collapses to (1.2) showing that the deterministic trend is a
limiting case.
Let ,pt be a cyclical function of time with frequency Ac, which is measuredin
radians. The period of the cycle, which is the time taken to go through its
complete sequenceof values, is 2-rr/Ac'A cycle can be expressedas a mixture of
sine and cosine waves, depending on two parameters, a and {3. Thus
t/lt= a CDS Act+ {3 sin Act,
(1.5)
where (a2 + (32)1/2is the amplitude and tan-1({3/a) is the phase. Like the
linear trend, the cycle can be built up recursively, leading to the stochastic
model
("':) (
=p
"'I
CO.SAc
-sm
Ac
) ( )+ ( K:) ,
sin Ac
cos Ac
"'1;1
"'1-1
KI
(1.6)
where Kt and K: are mutually uncorrelated with a common variance, u~, and p
is a damping factor, such that 0.;;;p .;;;1. The model is stationary if p is strictly
Structural time series models
263
less than one, and if Ac is equal to 0 or 1T it reduces to a first-order
autoregressive process.
A model of deterministic seasonality has the seasonal effects summing to
zero over a year. The seasonaleffects can be allowed to change over time by
letting their sum over the previous year be equal to a random disturbance term
Wi' with mean zero and variance u:. Thus, if s is the number of seasonin the
year,
s-I
2:"It-j
= Wt
or
"It= -
j~O
'
s-I
2:"It-j
+ Wt .
(1.7)
j=1
An alternative way of allowing seasonal dummy variables to change over
time is to suppose that each seasonevolves as a random walk but that, at any
particular point in time, the seasonalcomponents, and hence the disturbances,
sum to zero. This model was introduced by Harrison and Stevens (1976, p.
217-218).
A seasonalpattern can also be modelled by a set of trigonometric terms at
the seasonalfrequencies, Aj = 21Tj/S,j = 1, . . . , [s/2], where[s/2] is s/2 if s is
even and (s -1)/2 if s is odd. The seasonaleffect at time t is then
"It=
[s/2)
2: ("IjCOSA/+
"It sin A/).
(1.8)
j~1
When s is even, the sine term disappears for j = s/2 and so the number of
trigonometric parameters, the "Ij and "It, is always (s - 1)/2, which is the same
as the number of coefficients in the seasonaldummy formulation. A seasonal
pattern based on (1.8) is the sum of [s/2] cyclical components, each with p = 1,
and it may be allowed to evolve over time in exactly the same way as a cycle
was allowed to move. The model is
[s/2]
'Yt= ~ ~,I'
j=l
(1.9)
where, following (1.6),
('Yi:)= (
'Yj,t
~O.8 Aj
8m Aj
sin Ai
COSAi
)(
'Yi:-l
'Yi,t-l
) +. ( )
Wi~t
Wi,t
,
(1.10)
where Wit and wi~' j = 1, . . . , [s/2], are zero mean white noise processeswhich
are uncorrelated with each other with a common variance (7:. As in the cycles
(1.6) 'Yi~t appears as a matter of construction, and its interpretation is not
particularly important. Note that when s is even, the component at j = s/2
collapses to
')1,1= ')1,1-1 cos Ai + Wil'
(1.11)
If the disturbancesin the model are assumedto be normally distributed, the
A. C. Harveyand N. Shephard
264
:5 i!
0
~
0
0::'
'g
..e.(
~1;j_.12_.~
~~ ~~
'g § ~ 0 ~
~N~E;
..e~.5:;;15
8,...!:
~~ ~
~'8e
1;;;:!1
~ ~
15~
eb.;'e.1;;1J!~s.°
.g ~ :g .g ~ ~ §:i!
:;.~OCN~15COO'"
.ce.~
~ 'g .,;:
.::U'g
.12
~.~
!!
110
~'OQEb~.5:I:U'"
u~
§u
'0 110,,;
e;
~
~~
<>.<:
e~.o& :!: ~ 0
~Oe:6°
~8.-:.
~§~
~:I:.;;~e
~
I
~
~
N'
N
0
~
...;
i;;:
N
<'
~
I>:
-(
~
~
<
..:
~N
+ "'.N
=~';(
'" <::0
~
<';;::0"i>:P::
8«
..
N
No.1
+
0.1
-CO
~
~
«
8
N
I
..
~
~
I
'"
~
N
I
~
~
~
I
'"
~
~
<:I"
<:I
+
I
,.:;
i
~
+
+
~
':..1
I
+
"'I
+
~
~
I
~
~
<:I
II
-
';
::i'
~
tJ;'
II
""
N
F
~~
~
~
VI
+
-<
~
VI
,. ,. 0
'"
+
~
~ c
~ .~
~~
II .c
,:;'~
'I
cQ:..;;
+ +
'I 'I
,{cQ:
II
II
,{cQ:
6:
~"
~ '~
:r:r
II""
,,'i
~..j
~
§
~
"," "," V
.~ § "VI
" -<v 0
-< c::
8 Or
"II
~
~~
-If
ij>
.«3
..~~
+~
"VI
~+O
" :I."
\;
"~
i
~OV
II
id
I~
-!>'~ II
~"$
..~ +
"$
+~+i~
:I::I::I::i:<Zt
\;
II
oS
,.:;"!i
II
II
,.:;
:I:"!i
oS
":;
.0
"
"
J!
.~
:Q
.c
g
in
<3
~
.i!
gj
~
'"
5:
2
...
::::-;;
gi ~ ~
~
..
=
.51
~
"0
c
..!<
..
:J
~
-~
'"
.E!
'"
..2 5 'S-
:S
7ii.g:;:.g
g = = 0
"jf!o.5
;;-0
~ "
$
e
.;j
g
"~
-1j:g-g
'" 0 "
u e ~
Q
B
"0
"
~
1j
~
§:
.~
~,:
"
:E
'"
'0
Iii
~~
= "" -<
>:
~ 1~:tII
.(
'"
~
"
"0
i;!
~
"= .
"~~
<Zt
+
+-!>
.5
~...;
i:' II
~
+
3+"
..f
...
3.
v--1I
II
i':
~ II
"':'_-<"
~,;: -S
~ "i
"
.
.c ~
~~
a:
.5
+
i;!
"'.I!a
+
:!:
:!:~
II "
>:f
<)
'C
.~ »
<;; >,
E
~
0
z
~~
e 0
'" ~
Q Ii
§~
..
g,
'C
~
2
..
~
:c
~
1J!
.:::~.~~
~ ~
'051
1JS
;15
0'<:-
ClJ::..,
;§
..
~
'"
16
01
B
~
~o
',,; "0
os 0
IX) 8
E
:0~
!;
'<i'
~
~
Structural time series models
265
hyper parameters (u~, u~, u:, u;, p, Ac, u;) may be estimated by maximum
likelihood. This may be done in the time domain using the Kalman filter as
described in Section 2, or in the frequency domain as described in Harvey
(1989, Chapter 4). Harvey and Peters (1990) present simulation evidence on
the performance of different estimators. Once the hyperparameters have been
estimated, the state space form may be used to make predictions and construct
estimators of the unobserved components.
EXAMPLE.A model of the form (1.1), but without the seasonal component,
was fitted to quarterly, seasonally adjusted data on US GNP from 194701 to
198802. The estimated variancesof 111,~1'K" and 151
were 0, 0.0015, 0.0664 and
0 respectively, while the estimate of p was 0.92. The estimate of Ac was 0.30,
corresponding to a period of 20.82 quarters. Thus the length of businesscycles
is roughly five years.
A summary of the main structural models and their properties may be found
in Table 1. Structural time series models which are linear and time invariant,
all have a corresponding reduced form autoregressive integrated moving
average (ARIMA) representation which is equivalent in the sensethat it will
give identical forecasts to the structural form. For example in the local level
model,
Yt = ILl + 6t ,
ILt=lLt-l+11t'
(1.12)
where E, and 11,are mutually uncorrelated white noise disturbances, taking first
differences yields
.:1y,= 11,+ E,- E,-l ,
(1.13)
which in view of its autocorrelation structure is equivalent to an MA(I) process
with a nonpositive autocorrelation at lag one. Thus Yt is ARIMA(O, 1, 1). By
equating autocorrelations at lag one it is possible to derive the relationship
between the moving averageparameter and q, the ratio of the variance of TItto
that of lOt,In more complex models, there may not be a simple correspondence
between the structural and reduced form parameters. For example in (1.1),
L1L1sYt
is
ARMA(2, s + 3), whereL1sis the seasonaldifferenceoperator. Note
that the terminology of reduced and structural form is used in a parallel fashion
to the way it is used in econometrics, except that in structural time series
models the restrictions come not from economic theory, but from a desire to
ensure that the forecasts reflect features such as cycles and seasonalswhich are
felt to be present in the data.
In addition to the main structural models found in Table 1 many more
structural models may be constructed, Additional components may be introduced and the components defined above may be modified. For example,
quadratic trends may replace linear ones, and the irregular component may be
266
A. C. Harveyand N. Shephard
formulated so as to reflect the sampling scheme used to collect the data. If
observations are collected on a daily basis, a slowly changing day of the week
effect may be incorporated in the model, while for hourly observations an
intra-day pattern may be modelled in a similar way to seasonality. A more
parsimonious way of modelling an intra-day pattern, based on time-varying
splines, is proposed in Harvey and Koopman (1993).
1.2. Model selection
The most difficult aspect of working with time series data is model selection.
The attraction of the structural framework is that it enables the researcher to
formulate, at the outset, a model which is explicitly designed to pick up the
salient characteristics of the data. Once the model has been estimated, it
suitability can be assessed,not only by carrying out diagnostic tests, but also by
checking whether the estimated components are consistent with any prior
knowledge which might be available. Thus if a cyclical component is used to
model the trade cycle, a knowledge of the economic history of the period
should enable one to judge whether the estimated parameters are reasonable.
This is in the same spirit as assessingthe plausibility of a regression model by
reference to the sign and magnitude of its estimated coefficients.
Classical time series analysis is based on the theory of stationary stochastic
processes, and this is the starting point for conventional time series model
building. Nonstationarity is handled by differencing, leading to the ARIMA
class of models. The fact that the simpler structural time series models can be
made stationary by differencing provides an important link with classical time
series analysis. However, the analysis of series which are thought to be
stationary does not playa fundamental role in structural modelling methodology. Few economic and social time series are stationary and there is no
overwhelming reason to supposethat they can necessarilybe made stationary
by differencing, which is the assumption underling the ARIMA methodology
of Box and Jenkins (1976). If a univariate structural model fails to give a good
fit to a set of data, other univariate models may be considered, but there will
be an increasedwillingness to look at more radical alternatives. For example, a
search for outliers might be initiated or it may be necessaryto concede that a
structurally stable model can only be obtained by conditioning on an observed
explanatory variable.
Introducing explanatory variables into a model requires accessto a larger
information set. Some prior knowledge of which variables should potentially
enter into the model is necessary,and data on these variables is needed. In a
structural time seriesmodel the explanatory variables enter into the model side
by side with the unobserved components. In the absenceof these unobserved
components the model reverts to a regression, and this perhaps makes it clear
as to why the model selection methodology which has been developed for
dynamic regression is appropriate in the wider context with which we are
concerned. Distributed lags can be fitted in much the same way as in
Structural time series models
267
econometric modelling, and even ideas such as the error-correction mechanism
can be employed. The inclusion of the unobserved time series components
does not affect the model selection methodology to be applied to the
explanatory variables in any fundamental way. What it does is to add an extra
dimension to the interpretation and specification of certain aspects of the
dynamics. For example, it provides a key insight into the vexed question of
whether to work with the variables in levels or first differences, and solves the
problem by setting up a general framework within which the two formulations
emerge as special cases.
The fact that structural time series models are set up in terms of components
which have a direct interpretation means that it is possible to employ a model
selection methodology which is similar to that proposed in the econometrics
literature by writers such as Hendry and Richard (1983). Thus one can adopt
the following criteria for a good model: parsimony, data coherence, consistency with prior knowledge, data admissibility, structural stability and encompassing.
2. Linear statespacemodelsand the Kalmanfilter
The linear state space form has been demonstrated to an extremely powerful
tool in handling all linear and many classesof nonlinear time seriesmodels; see
Harvey (1989, Chapters 3 and 4). In this section we introduce the state space
form and the associatedKalman filter. We show how the filter can be used to
deliver the likelihood. Recent work on smoothing is also discussed.
2.1. The linear state spaceform
Supposea multivariate time seriesYt possesses
N elements.This seriesis
related to a p x 1 vector at> which labelled the state, via the measurement
equation
Yt=ZtUt+Xrl3+et,
t=I,...,T.
(2.1)
Here Zt and Xt are nonstochastic matrices of dimensions N x p and N x k
respectively, {3 is a fixed k-dimensional vector and Et is a zero mean, N x 1
vector of white noise, with variance Ht'
The measurement equation is reminiscent of a classical regression model,
with the state vector representing some of the regression coefficients. However, in the state space form, the state vector is allowed to evolve over time.
This is achieved by introducing a transition equation, which is given by
at
= Ttat-l + W;~+ Rt'T1t' t = 1, . . . , T ,
(2.2)
where T" W, and R, are fixed matricesof size (p x p), (p x k) and (p x g)
respectively, Tlr is a zero mean and 8-dimensional vector of white noise, with
268
A. C. Harveyand N. Shephard
variance Q" In the literature 11,and f:s have always been assumed to be
uncorrelated for aIls #- t. In this paper we will also assumethat 11,and f:, are
uncorrelated, although Anderson and Moore (1979) and more recently De
Jong (1991) and Koopman (1993) relax this assumption.
The inclusion of the R, matrix is somewhat arbitrary, for the disturbance
term can always be redefined to have a variance R,Q,R;. However, the
transition equation above is often regarded as being more natural. The
transition equation involves the state at time zero and so to complete the state
spaceform we need to tie down its behaviour. We assumethat ao has a mean ao
and variance Po' Further, ao is assumedto be uncorrelated with the noise in
the transition and measurement equations. This completed state space form is
said to be time invariant if Z" X" H" w" R, and Q, do not change over time.
To illustrate these general points we will put the univariate structural model
(1.1) of trends, seasonalsand cycles discussedin Section 1 into time invariant
state space form by writing a, = (IL" {3" "" "'-1' . . . , ",-s+2' 1/1"1/1:)', where
Y,= (1 0 1 0 0
1 1 0
0
0
01000...00
0 0 -1 -1 -1
0 0 1 0
0
0:: 0 0:
1:: 0
a=
I
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
...
...
...
...
...
...
0
0
1 O)a,+ f:, ,
0
-1 -1
0
0
0:: 0
0
0
0
0
0:
(2.3a)
0
0
0
0
0:
1
0
0
0
...
0
0
p cos Ac
P sin Ac
...
0
0
-p sin Ac P cos Ac
at-l
11,
(,
(J),
0
+
0
(2.3b)
0
K,
*
K,
2.2. The Kalman filter
In most structural
time series models the individual
elements of
a,
are
unobservable, either becauseof the presenceof some masking irregular term f:,
or because of the way a, is constructed. However, the observations do carry
some information which can be harnessedto provide estimatesof the \.mknowable a,. This estimation can be carried out using a variety of information sets.
We will write Ys to denote this information, which will be composed of all the
observations recorded up to time s and our initial knowledge of aD. The two
I
Structural time series models
269
most common forms of estimation are smoothing, where we estimate at using
Yr, and filtering,
where we estimate using only
~.
We will focus on various
aspects of smoothing in Section 2.4, but here we look at filtering.
Filtering allows the tracking of the state using contemporaneously available
information. The optimal, that is minimum mean square error, filter is given by
the mean of the conditional density of at' given Yt, which is written as atl~.
The Kalman filter delivers this quantity if the observations are Gaussian. If
they are non-Gaussian the Kalman filter provides the optimal estimator
amongst the class of linear estimators. Here we develop the filter under
Gaussianity; see Duncan and Horn (1972) for an alternative derivation.
We start at time zero with the knowledge that ao- N(ao, Po). If we combine
the transition and measurement equations with this prior and write Yo to
express the information in it, then
(Yl ) IYo-N (( Zlallo+Xd3'
alia
)
Ul
( ZlPllo
PliO
)) '
P1IOZ~
Fl
(2.4)
where
PliO
= T1PoT~+ R1QIR~ ,
F1 = ZlPIIOZ~ + HI ,
alia= Tiao+ Wt/3.
(2.5)
Usually, we write Vt = Yt - ZtUtlO- Xt/3 as the one-step ahead forecast error.
It is constructed so that viI Yo- N(O, Ft). Consequently, using the usual
conditioning result for multivariate normal distributions as given, for example,
in Rao (1973)
al
IYI - N(ap PI) ,
al
= aiiO
(2.6)
where
+ PlloZ~F~IVI ,
PI = PliO- PlloZ~F~IZIPIIO'
(2.7)
This result means that the filter is recursive. We will use the following
notation throughout the paper to describe the general results: at-II Yt-l N(at-t> Pi-I), at I Yt-l - N(atlt-l' Pill-I) and vt I Yt-l - N(O, Ft). The precise
definition of thesedensitiesis given in the following three setsof equations.
First the predictionequations
°'1'-1 = T,O'-1
+ w,13 ,
Ptlt-t
= TtPt-tT; + RtQ,R; ,
(2.8)
then the one-step ahead forecast equations
tit = Yt - Ztatlt-l - Xt{3 ,
Ft = Z~tlt-lZ;
+ Ht ,
(2.9)
270
A. C. Harvey and N. Shephard
and finally the updating equations
at
= atlt-t
and Pt = Ptlt-t
+ Ptlt-tZ;F;tVt
- Ptlt-tZ;F;tZtPtlt-t .
(2.10)
One immediate result which follows from the Kalman filter is that we can
write the conditional joint density of the observations as
T
f(yp. . . , Yr IYo) = IT f(Yt
t=1
T
IYt-l) = IT f(vt IYt-l) .
t=1
(2.11)
This fracturing of the conditional joint density into the product of conditionals
is called the prediction error decomposition. If at is stationary, an unconditional joint density can be constructed since the initial conditions, ao and Po, are
known. The case where we do not have stationarity has been the subject of
some interesting research in recent years.
2.3. Initialization for non-stationary modelsl
We will derive the likelihood for a model in state space form using the
argument in De long (1988a). A slightly different approach can be found in
Ansley and Kohn (1985). We present a isimplified derivation based partly on
the results in Marshall (1992a). For easeof exposition {3 will be assumedto be
zero and all the elements in ao can be taken to be nonstationary. We start by
noting that if we write y = (y~, y;, . . . , y~)', then
f( ) = f(ao = O)f(y Iao = 0) .
y
f(ao = 0 Iy)
(2.12)
The density f(y I ao = 0) can be evaluated by applying the Kalman filter and
the prediction error decomposition if we initialize the filter at ao = 0 and
Po = O.We denote this filter by KF(O, 0), and the corresponding output as a: ,
a~t-l and v:. The density f(ao = 0) has a simple form, which leavesus with the
problem of f(ao = 0 I y). If we write v* = (v~', v;',. . . , v;' )', then we can use
the result that v* is a linear combination of y in order to write f(ao = 0 Iy) =
f(ao = 0 v*). To be able to evaluatef(ao = 0 v*) we will need to define F as a
I
I
block diagonalmatrix, with blocksbeingFt and A as a matrix with row blocks
ZtGt~I' where Gt = Tt+l(I - KtZt)Gt-p
Go = Tp and Kt = Ptlt~IZ:F;1
(the
so-calledKalmangain).In all casesthe quantitiesare evaluatedby the Kalman
filter under the startup condition Po= O.Then as
vt(ao) = Yt - E(Yt
IYt~p
ao)
= Yt -
Ztatlt-l(ao) ,
(2.13)
I The rest of Section 2 is more technical and can be omitted on first reading without loss of
continuity.
271
Structural time series models
and
at+llt(ao)
= Tt+lat!t-l(aO)
+ Tt+1Ktvt(aO)
= Tt+l(I - KtZt)atlt-l(aO) + Tt+1KtYt
= GtaO + a;+llt ,
ICy) = -tlogIPol-ta~p~lao-t
(2.14)
T
T
1=1
1=1
L loglF,l-t L V:'F;lV:
Traditionally, nonstationary state spacemodels have been initialised into two
ways. The first is to use a diffuse prior on Uo I Yo; this is to allow the diagonal
elements of Po to go to infinity. We can see that in the limit the result from this
is that
l(y)
+ t log IPol--+-t
T
T
2:logIP11-t 2:v; 'p;IV;
1=1
1=1
- 1I
IS 1 - 1 ' ('-1
"2 og T
'ISr-'T ST
= -tlogISTI-tlogIPI
- tV*'(P-1 - P-1A(A'P-1A)-IA'P-1)v*
.
(2.21)
An approximation to (2.21) can be obtained for many models by running
A.
272
C. Harvey and N. Shephard
KF(ao, Po) with the diagonalelementsof Po set equal to large, but finite,
values. The likelihood is then constructed from the prediction errors once
enough observations have been processedto give a finite variance. However,
the likelihood obtained from (2.21) is preferable as it is exact and numerically
stable.
The other main way nonstationary models are initialised is by taking clO to be
an unknown constant; see Rosenberg (1973). Thus ao becomes a nuisance
parameter and Po is set to zero. In this case, in the limit, the likelihood
becomes
(2.22)
= - t log IFI- Hv* - Aao)'F-1(v* - Aao) ,
(2.23)
the term a~STaOin (2.22) appearing when (p~l + ST)-l is expanded out. We
can concentrate ao out at its maximum likelihood value tzo=
(A'F-1A)-lA'F-1v*,
to deliver the profile or concentrated likelihood function
c(y)
= -t logIFI- tv.'(F-1 - F-1A(A'F-1A)-1A'F-1)v. .
(2.24)
The difference between the profile likelihood and the likelihood given in (2.21)
is simply the log ISTI term. The latter is called a marginal or restricted
likelihood in the statistics literature; cf. McCullagh and NeIder (1989, Chapter
7). It is based on a linear transformation of y making the data invariant to ao'
The term log ISTI can have a significant effect on small sample properties of
maximum likelihood (ML) estimators in certain circumstances. This can be
seenby looking at some results from the paper by Shephard and Harvey (1990)
which analysesthe sampling behaviour of the ML estimator of q, the ratio of
the variances of 1ft and eo in the local level model (1.12). When q is zero the
reduced form of the local level model is strictly noninvertible. Evaluating the
probability that q is estimated to be exactly zero for various true values of q
Table 2
Probability that ML estimator of signal-noise ratio q is exactly equal to ...
Marginal likelihood
T-l
q=O
q=O.Ol
q=O.l
q=1,~10
0.12
0.01
0.00
Profile likelihood
T- 1
q = 0
q = 0.01
q = 0.1
q = 1
q = 10
10
30
50
0.95
0.87
0.72
0.88
0.49
0.28
0.60
0.20
0.08
0.44
0.13
0.05
0.%
0.96
0.96
273
Structural time series models
and sample sizes, gives the results summarisedin Table 2. It can be seen that
using a profile likelihood instead of a marginal results in a much higher
probability of estimating q to be zero. Unless q is actually zero, this is
undesirable from a forecasting point of view since there is no discounting of
past observations. This provides a practical justification for the use of diffuse
initial conditions and marginal likelihoods.
2.4. Smoothing
Estimating at using the full set of observations YT is called smoothing.The
minimum mean squareestimator of at using YT is E at YT. An extensive
review of smoothing is given it} Anderson and Moore (1979, Chapter 7).
I
Recentlythere havebeensomeimportantdevelopments
in the way E at IYT
is obtained; see, for example, De long (1988b, 1989), Koho and Ansley (1989)
and Koopman (1993). These breakthroughs have dramatically improved the
speed of the smoothers. The new results will be introduced by using the
framework of Whittle (1991). For easeof exposition, Rt will be assumedto be
an identity matrix and f3 will be assumedto be zero.
Under Gaussianity, E at I YT is also the mode of the density of at I YT. Thus
we can use the general result that under weak regularity, if to is a generic
density function and m denotes the mode, then
I
af(x z)
I
ax
=0
x=m
if and only if af(x,
ax z) I x=m= o.
(2.25)
The smoother can therefore be found by searching for turning points in the
a~,a~,. . . , a~, y~, . . . , y~, the logarithmof which is
joint density of
D = constant- t(ao - aoyp~l(ao - ao)
- t
-
T
2:(Yt - ZtatYH;I(Yt
-
Ztat)
t=1
T
t 2:(at -
Ttat-lYQ;I(at
- Ttat-l)
.
(2.26)
t=1
Thus
aD
I
-1
:;-=ZtHt
"at
-1
Et-Qt
I
-1
T/t+Tt+lQt+lT/t+l
fort=l,...,T.
Equating to zero, writing the solutions as at and Et = Yt
( 2.27
-
Ztat
and ~I
)
= al-
Ttat-l results in the backward recursion
at-l
A
= T-1(
t
at
A
- Q t (Z t'H t-1 Et
=T;I(at-llt),
A
' Q t+l
-l
+ T t+l
t=l,...,T,
A
T/t+l
»
(2.28)
A. C. Harveyand N. Shephard
274
as
Z tHt
'
-1.
Et -
Q t-IA 11t+ T"'t+l Q t+l
-l
11t+l
A
=0 .
(229)
.
The staTtingpoint aT = aT is givenby the Kalmanfilter. Unfortunately,using
1Jt
= Qt(T;+1Q
;}1 1Jt+1+
Z;H;16,) ,
(2.30)
will lead to a numerically unstable filter even though mathematically this result
holds exactly. Koopman's (1993) shows that it can be stabilised by computing
8, not by y, - Z,a" but by
i, = H,(F;lV,- K;T;+1Q;+\iJt+l)'
(2.31)
where F, and K, are computed using KF(O, 0) and vI = v; - Z,Gt-lS;lST'
Thus
the efficient smoother uses (2.28), (2.30) and (2.31).
Recently, Harvey and Koopman (1992) have proposed using the smoothed
estimates of EI and TIt to check for outliers and structural breaks, while
Koopman (1993) usesthem to implement a rapid EM algorithm and Koopman
and Shephard (1992) show how to construct the exact score by smoothing.
3. Explanatoryvariables
Stochastic trend components are introduced into dynamic regression models
when the underlying level of a nonstationary dependent variable cannot be
completely explained by observable explanatory variables. The presence of a
stochastic trend can often be rationalised by the fact that a variable has been
excluded from the equation because it is difficult, or even impossible, to
measure. Thus in Harvey et al. (1986) and Slade (1989), a stochastic trend is
used as a proxy for technical progress, while in the demand equation for UK
spirits estimated by Ansley and Kohn (1989) the stochastic trend can be
thought of as picking up changesin tastes. Such rationalisation not only lends
support to the specification of the model, but it also means that the estimated
stochastic trend can be analysed and interpreted.
If stochastic trends are appropriate, but are not explicitly modelled, their
effects will be picked up indirectly by time trends and lags on the variables.
This can lead to a proliferation of lags which have no economic meaning, and
which are subject to common factors and problems of inference associatedwith
unit roots. An illustration of the type of problems which can arise with such an
approach in a single equation context can be found in Harvey et al. (1986),
where a stochastictrend is used to model productivity effects in an employment
output equation and is compared with a more traditional autoregressive
distributed lag (ADL) regressionmodel with a time trend. Such problems may
become even more acute in multivariate systems, such as vector autoregressions and simultaneous equation models; see Section 5.
Other stochastic components, such as time-varying seasonalsor cycles, can
Structural time series models
275
also be included in a model with explanatory variables. Since this raisesno new
issues of principle, we will concentrate on stochastic trends.
3.1. Formulation and estimation
A regression model with a stochastic trend component may be written
Yl=JLt+X;8+B"
t=1,...,T,
(3.1)
where ILl is a stochastic trend (1.4), Xt is a k x 1 vector of exogenous
explanatory variables, [) is a corresponding vector of unknown parameters, 8t is
a normally distributed, white noise disturbance term with mean zero and
variance a;. A standard regression model with a deterministic time trend
emerges as a special case, as does a model which could be estimated efficiently
by OLS regression in first differences; in the latter case a; = a~ = O.
In the reduced form of (3.1), the stochastic part, ILl + 81' is replaced by an
ARIMA(O, 2, 2) process. If the slope is deterministic, that'is a~ = 0 in (1.3), it
is ARIMA(O, 1, 1). Box and Jenkins (1976, pp. 409-412) report a distributed
lag model fitted to first differences with an MA(1) disturbance term. This
model can perhaps be interpreted more usefully as a relationship in levels with
a stochastic trend component of the form
ILt = ILt-l
+ {3+ 'TIt .
(3.2)
Maximum likelihood estimators of the parameters in (3.1) can be constructed in the time domain via the prediction error decomposition. This is
done by putting the model in state spaceform and applying the Kalman filter.
The parameters 8 and {3 can be removed from the likelihood function either by
concentrating them out of form of a profile likelihood function as in Kohn and
Ansley (1985) or by forming a marginal likelihood function; see the discussion
in Section 2.3. The marginal likelihood can be computed by extending the state
so as to include {3 and 8, even though they are time-invariant, and then
initializing with a diffuse prior.
The difference between the profile and marginal likelihood is in the
determinantal term of the likelihood. There are a number of arguments which
favour the use of marginal likelihoods for inference in small samples or when
the process is close to nonstationarity or noninvertibility; see TunnicliffeWilson (1989). In the present context, the difference in behaviour shows up
most noticeably in the tendency of the trend to be estimated as being
deterministic. To be more specific, suppose the trend is as in (3.2). The
signal-noise ratio is q = u:/u. and if this is zero the trend is deterministic. The
probability that q is estimated to be zero has been computed by Shephard
(1993a). Using a profile likelihood by concentrating out {3 leads to this
probability being relatively high when q is small but nonzero. The properties of
the estimator obtained from the marginal likelihood are much better in this
respect.
276
A. C. Harvey and N. Shephard
3.2. Intervention analysis
Intervention analysis is concerned with making inferences about the effects of
known events. These effects are measured by including intervention, or
dummy, variables in a dynamic regressionmodel. In pure intervention analysis
no other explanatory variables are present.
Model (3.1) may be generalized to yield the intervention model
y, = J.L,
+ x;a + Aw, + 13" t = 1, . . . , T ,
(3.3)
where w, is the intervention variable and A is its coefficient. The definition of
w, dependson the form which the intervention effect is assumedto take. If the
intervention is transitory and has an effect only at time t, w, is a pulse variable
which takes the value one at the time of the intervention, t = T, and is zero
otherwise. More generally the intervention may have a transitory effect which
dies away gradually, for example, we may have w, = cp'-T, when Icpl< 1, for
t ;;:.:T. A permanentshift in the level of the seriescan be capturedby a step
variable which is zero up to the time of the intervention and unity thereafter.
An effect of this kind can also be interpreted as a transitory shock to the level
equation in the trend, in which case it appears as a pulse variable in (1.4a).
Other types of intervention variable may be included, for example variables
giving rise to changes in the slope of the trend or the seasonalpattern. The
advantage of the structural time series model framework over the ARIMA
framework proposed by Box and Tiao (1975) is that it is much easier to
formulate intervention variables having the desired effect on the series.
Estimation of a model of the form (3.3) can be carried out in both the time
and frequency domains by treating the intervention variable just like any other
explanatory variable. In the time domain, various tests can be constructed to
check on the specification of the intervention; see the study by Harvey and
Durbin (1986) on the effect of the UK seat belt law.
4. Multivariate time series models
4.1. Seemingly unrelated time seriesequations (SUTSE)
The structural time series models introduced in Section 1 have straightforward
multivariate generalisations. For instance, the local level with drift becomes,
for an N-dimensional series y, = (Ylt> . . . , YN,)',
Y,= IL,+ E,,
J.L,
= J.LH +
-
E,
(3 + 17,,
NID(O,
-
1:E) ,
17, NID(O,
1:1)
,
(4.1)
where 1:Eand 1:1)are nonnegative definite N x N matrices. Such models are
called seemingly unrelated time series equations (SUTSE) reflecting the fact
that the individual series are connected only via the correlated disturbances in
Structural time series models
277
the measurement and transition equations. Estimation is discussed in
Fernandez (1990).
The maximisation of the likelihood function for this model can be computationally demanding if N is large. The evaluation of the likelihood function
requires O(N3) floating point operations and, although {3 can be concentrated,
there are still N x (N + 1) parameters to be estimated by numerical optimisation. However, for many applications there are specific structures on Ie and
I." that can be exploited to make the computations easier. One example is
whereIe and I." are proportional,that is I." = qIe' Sucha systemis saidto be
homogeneous. This structure allows each of the series in Yt to be handled by
the same Kalman filter and so the likelihood can be evaluated in O(N)
operations. Further, Ie can be concentrated out of the likelihood, leaving a
single parameter q to be found by numerical maximisation. The validity of the
homogeneity assumption can be assessedby using the Lagrange multiplier test
of Fernandez and Harvey (1990).
4.2. Error components models
Consider the classical error components model
i=l,...,N,
Yit=p.+.Aj+Vt+wit,
t=l,...,T,
(4.2)
where J.Lrepr,esentsthe overall mean and Ai, vt and Wit are unit specific and
time specific effects respectively, assumed to be serially and mutually independent, Gaussian and with expected values equal to zero. The dynamic
versions of this model studied in the literature usually include lagged dependent variables and autoregressiveprocessesfor the components Vt and wit; see
Anderson and Hsiao (1982).
A more natural approach to the specification of dynamic error components
models, can be based on the ideas of structural time series models. This is
suggestedby Marshall (1992b), who allowed both time specific and time-unit
specific effects to evolve over time according to random walk plus noise
processes.The error components model becomes
Yit
= ILit + Et + E~,
ILit
= ILi,t-l
+1'It +1'1~ ,
(4.3)
where /-Litis the mean for unit i at time t and Et' E~, lIt and 1I~ are assumedto be
independent,zero mean, Gaussianrandomvariables,with variancesu;, u;.,
u~ and u~. respectively.This model is a multivariatelocal level model, with
the irregular and level randomshocksdecomposed
as commoneffects,Et and
lIt, and specificeffects,E~ and 1I~' This meansthat
l:E=u;.I+u;u'
and l:TJ=u~.I+u~u',
where L is the N-dimensional unit vector and I the identity matrix.
(4.4)
278
A. C. Harveyand N. Shephard
If (T~ and (T~. are equal to zero, the model reduces to the static error
components model discussedin (4.2). On the other hand if (T~is greater than
zero, but (T~. is equal to zero, the N time series have, apart from a time
invariant effect, the same time-dependent mean. In this situation, the time
series are cointegrated in the senseof Engle and Granger (1987).
Optimal estimates of the components /Lit can be obtained by means of the
Kalman filter. That requires the manipulation of N x N matrices and so it
becomescumbersome if N is large. However, the idea of homogeneity can be
used to reduce these calculations dramatically. Take for each time t the
average of the observations acrossunits and the first N - 1 deviations from this
average. Thus, in an obvious notation, (4.3) becomes
Yt = fit + £t +
it
,
(4.5a)
fit = fit-l +1Jt+ ii:
(Yit
(4.5b)
- it) = (#Lit- iit) + (8; - i;),
(4.6a)
(#Lit- ii/) = (#Li,/-l - iiH) + (TJ~- Tin ,
i=l,...,N-l,
(4.6b)
t=l,...,T,
with the equations in (4.5) and (4.6) being statistically independent of one
another. As the transformation to this model is nonsingular, the estimation of
the trends /Lit can be obtained from this model instead of from the original
error components model. The estimation of the average level can be carried
out by running a univariate Kalman filter over the average values of the
observations it' The remaining N - I equations can be dealt with straightforwardly as they are a homogeneoussystem, with variances proportional to
(I - u'/N), where 1 and t are now N -I-dimensional unit matrices and
vectors.
The Kalman filter which provides the estimator of !it usingthe information
available up to time t is
where fit is the MSE of iiiI' given by
- = (Pt-l
-
Pt
+ u~.IN) -
(Pt-l
+ U 211
(
Pt-l
+
U2
+
11
+ U 211+ u 2 )2
11'
U2
.IN
11
+
U2
+
u;.IN)
. (4.8)
.
These recursions are run from t = 2 and with initial values liil = Yl and
PI = (u; + u;./N). With respectto the formulaeto obtainthe estimatorsof the
components(/Lit- iil) usingthe informationup to time t, m~ and their MSEs,
p~, thesehaveexactlythe sameform as\4.7) and (4.8) but with (u~ + u~./N)
and (u~ + u~./ N) replacedby «N - l)u'1./N) and «N - l)u;./ N) respective-
Structural time series models
ly and with initial values mil
279
= (Yil - Yl) for i = 1, . . . , N - 1. The estimators
of each #Lit>mit> and its MSE, Pit> are given by
mit=mt+m~,
-
*
Pit=Pt+Pit,
t=l,...,T,
i=l,...,N-l,
.
l=l,...,N-l,
t=l,...,T,
(4.9)
while mNt is obtained by differencing.
EXAMPLE.In Marshall (1992b), a error components model of the form given
above, but with a fixed slope as in (3.2), is estimated for the logarithm of the
quarterly labour costs time series in Austria, Belgium, Luxembourg and The
Netherlands. The sample period considered in 1970 to 1987 and so N = 4 and
T = 72. The maximum likelihood estimates of the parameters were
u;=O,
u;.=0.115XlO-3,
u~ = 0.249 X 10-3 ,
u~. = 0.159 X 10-3.
(4.10)
4.3. Explanatory variables in SUTSE models
The introduction of explanatory variables into the SUTSE model opens up the
possibility of incorporating ideas from economic theory. This is well illustrated
in the paper by Harvey and Marshall (1991) on the demand for energy in the
UK. The assumption of a translog cost function leads to the static share
equation system
Sj=aj+Lajjlog(p/1j),
i=l,...,N,
(4.11)
j
where the ai' i = 1, . . . , Nand aij' i, j = 1, . . . , N, are parameters, Si is the
shareof the i-th input, Pj is the (exogenous)
price of the j-th input and Tj is an
index of relative technical progress for the input j which takes the factor
augmenting form; see Jorgenson (1986).
The model can be made dynamic by allowing the log Tjt, relative technical
progress at time t for input j, to follow a random walk plus drift
log1jt=logTj,t-l+Pj+ijjt,
i=1,.."N.
(4.12)
If the random disturbance term Ejt is added to each share equation, this leads
to a system of share equations which can be written in matrix form as
Yt
= ILt +
AXt + 6t ,
ILt= 1Lt-l + (3 + 11"
6t
- NID(O,.Ie),
-
11t NID(O, .I'I) ,
(4.13)
. . . , SN)" Here ILl is an N x 1 vector
depending on the a;, a;j and log <, so that the i-th element of ILl is ai +
where YI is an N x 1 vector of shares (SI'
Eaij log 'Tit>
while A is an N x N matrix of a;j andxI is the N x 1 vectorof the
log Pj/s.
A. C. Harveyand N. Shephard
280
Harvey and Marshall (1991) estimated (4.13) under the assumption of
statisticalhomogeneity,that is 1:1)= q1:. and foundthis to be a reasonable
assumption using the LM test referred to in Section 4.1. One equation was
dropped to ensure that the shares summed to one. Finally restrictions from
economic theory, concerning cost exhaustion, homogeneity and symmetry,
were incorporated into the A matrix.
4.4. Common trends
Many economic variables seem to move together, indicating common underlying dynamics. This feature of data has been crystalised in the econometric
literature as the concept of cointegration; see, for example Engle and Granger
(1987) and Johansen (1988). Within a structural time series framework this
feature can be imposed by modifying (4.1) so as to construct a common trends
model
Yt
= elLt + St ,
St
ILt=lLt-l+{3*+Tlt,
- NID(O,1:,),
Tlt-NID(O,l:".),
(4.14)
where 8 is a N x K fixed matrix of factor loadings. The K x K matrix X.,,* is
constrained to be a diagonal matrix and 8ij = 0 for j > i, while 8ii = 1 in order
to achieve identifiability; see Harvey (1989, pp. 450-451). As X.,,* is diagonal,
the common trends, the elements of fJ-:, are independent.
The common trends model has K ~ N, but if K = N, it is equivalent to the
SUTSE model, (4.1), with {3 = 8{3* and X." = 8X.,,*8' where 8 and X.,,*are the
Cholesky decomposition of X.". This suggestsfirst estimating a SUTSE model
and carrying out a principal components analysis on the estimated X." to see
what value of K accountsfor a suitably large proportion of the total variation.
A formal test can be carried out along the lines suggestedby Stock and Watson
(1988), but its small sample properties have yet to be investigated in this
context. Once K has been determined, the common trends model can be
formulated and estimated.
EXAMPLE.Tiao and Tsay (1989) fitted various multivariate models to the
logarithms of indices of monthly flour prices in three cities, Buffalo, Minneapolis and Kansas City, over the period from August 1972 to November
1980. In their comment on this paper, Harvey and Marshall fit (4.1) and
Table 3
Principal components analysis of estimated covariance matrix of trend disturbances
Eigenvalues
Cumulative proportion
Eigenvectors
7.739
0.262
0.015
0.965
0.998
1.00
-0.55
0.35
0.76
-0.59
0.48
-0.65
-0.59
-0.81
-0.06
281
Structural time series models
conduct a principal componentsanalysison the estimatedI17' The results,
given in Table 3, indicate that the first principal component dominates the
variation in the transition equation and representsthe basic underlying price in
the three cities. Setting K equal to one or two might be appropriate.
Models with common componentshave also been used in the construction of
leading indicators; see Stock and Watson (1990).
4.5. Modelling and estimationfor repeatedsurveys
Many economic variables are measured by using sample survey techniques.
Examples include the labour force surveys which are conducted in each
member state of the European Community. It is now quite common practice to
analyse the results from repeated surveys using time series methods.
If sample surveys are nonoverlapping, then the survey errors are independent and a simple model for the vector of characteristicsat time t, Of'might be
y,=O,+s"
s,-N(O,H,),
t=l,...,T,
(4.15)
where the sampling errors s, are independent over time and are independent of
0,. A simple estimator of 0, would then be y,. However, by imposing a model
on the evolution of 0" an improvement in the precision of the estimate is
possible. This improvement will be very marked if 0, moves very slowly and H,
is large.
Scott and Smith (1974) suggestedfitting ARIMA models to 0,; see also Smith
(1978) and Jones (1980). A more natural approach is to use structural models.
The analysis of repeated, nonoverlapping surveys is based on the same
principles as standard time series model building except that constraints are
imposed on the measurementerror covariance matrix through sampling theory.
EXAMPLE.Consider the repeated sample survey of a set of proportions Ow
°2" . . . , Opt,using simple random sampling with sample size n, for t = 1, . . . , T.
If p = 2, and y, denotes the sample proportion in group one, the simple model
y,=O,+E"
s,-N
(0,
),
0,(1- 0,»
n,
1
°'=1+
(
exp
)'
-a,
a, = a'-I + 'TI,,
(4.16)
-
'TI, NID(O,u~) ,
will allow °1,= 0, and °2'= 1 - 0, to evolve over time in the range zero to one. If
p is greater than two or the state
a, evolves
in a more complicated
way,
perhaps with seasonals,the model can be modified appropriately. However,
the modelling principle is unchanged, sampling theory dictates the measurement error and time series considerations the transition equation. A discussion
of the way in whichsuchmodelscanbe estimatedmaybe found in Section6.2.
282
A. C. Harvey and N. Shephard
When the repeated surveys are overlapping the model for the measurement
equation can become very involved. A clear discussion of the principles
involved is given in Scott and Smith (1974). More recent work in this area
includes Hausman and Watson (1985), Binder and Dick (1989), Tam (1987),
Pfeffermann and Burck (1990) and Pfeffermann (1991).
The work of Pfeffermann (1991) fits well within the framework of this
discussion. He identifies three features of overlapping samples which may
effect the way the measurement error is modelled. The first is the way the
sample is rotated. For example a survey consisting of four panels which are
interviewed quarterly, three of the panels will have been included in past
surveys while one is wholly new. Thus each panel will remain in the panel for
four quarters. This rotation will interact with the secondfeature of overlapping
surveys, the correlation between individual observations. Pfeffermann, in
common with most researchersin this area, relies on Henderson's behavioural
model for the i-th individual of the survey made at time t. The model is
Yit - 9, = P(Yi.I-1 - 9t-t) + Cl)il,
Cl)il
- NID(O, u:,) ,
Ipi < 1 .
(4.17)
The Pfeffermann model is completed by the third feature, which is that the
design of the sllrvey is ignorable, although this assumption can be relaxed at
the loss of algebraic simplicity.
With these assumptions it is possible to derive the behaviour of the
measurement error in a model. If we use y;;j to denote the i individual at time
t, from a panel established at time t - j, then we can write
-1-1
Y,
.
L
.
M
= M1
I-I'
Yit'
J=0,1,2,3,
(4.18)
i=1
as the aggregatesurvey estimate of 9, from the panel established at time t - j,
then
,
Ut~1
YI=
y;
)
-2
-
-I -3
1
(
_/
+ ;:-2
-/-1
13/ ) ,
13/
~1
()
(4.19)
-/-3
13/
where
e,~ (~
0
0
0
0
p
0
0
p
( ::-2) =
-/
-/-1
13/
~)£,
T13,-1
+ 11/.
(4.20)
-/-3
W/
Structural time series models
283
The covariance of TItwill be
0 0 0
).
(4.21)
1 0 0
010
001
The model can be routinely handled by using the Kalman filter to estimate °t'
as well as the hyperparameters p, and O'~. In some casesthe individual panel
results will not be available, but instead only the aggregatewill be recorded.
Then the measurement equation becomes
YI
= 81+ 14 ( EI-I + EI-1-1 +
*
-
(
-I
YI
+
-1-1
YI
+
-1-2
Y1
-1-3
YI
+
-1-2
EI
)
+
-1-3
EI ) .
(4.23)
5. Simultaneous equation system
This section considers how simultaneous equation models can be estimated
when stochastic trend components of the kind described in Section 4 are
specified in some or all of the structural equations. We draw on the paper by
Streibel and Harvey (1993), which develops and compares a number of
methods for the estimation of single equations using instrumental variable (IV)
procedures or limited information maximum likelihood (LIML). The question
of identifiability is dealt with in Harvey and Streibel (1991).
5.1. Model formulation
Consider a dynamical simultaneousmodel in which some or all of the structural
equations contain stochastictrend components, which, to simplify matters, will
be assumedto follow a multivariate random walk. Thus
rYt = 4J1Yt-l+ ... + 4JrYt-r + BOXt + ... + BsXt-s + SILt+ Et,
ILt
--
ILt-l
+
TIt'
(5.1)
where r is an N x N matrix of unknown parameters, 4J1,. . . , 4Jr are N x N
matrices of autoregressive parameters, Bo,..., Bs are N x K matrices of
parameters associatedwith the K x 1 vector of exogenousvariables Xt and its
lagged values, ILt is an n x 1 vector of stochastictrends, S is an N x n selection
matrix of ones and zeros, such that each of the stochastic trends appears in a
particular equation, and TItand Etare mutually independent, normally distributed white noise disturbance vectors with positive definite covariance matrices 1:'1
and 1:. respectively. Equations which do not contain a stochastic trend will
usually have a constant term and if the exogenousvariables are stochastic, it
will be assumedthat they are generated independently of ILt and Et.
The model is subject to restrictions which usually take the form of certain
variables being excluded from certain equations on the basis of prior economic
284
A. C. Harvey and N. Shephard
knowledge. In a similar way, it will normally be the case that there is some
rationale for the appearance of stochastic trend components in particular
equations. Indeed many econometric models contain a time trend. For
example the wage equation in the textbook Klein model has a time trend which
is explained in terms of union pressure. Time trends also appear because of
technical progress just as in single equations. The argument here is that such
effects are more appropriately modelled by stochastic trends.
Pre-nmltiplying (5.1) by r-1 gives the econometric reduced form. Dropping
the lags, this may be written as
y, = BIL,+ IIx, + £, ,
(5.2)
where n = r-1B, e; = r-1et and (J= r-1s. If stochastictrends only appear in
some of the equations, that is 1,,;;;;n < N, then (5.2) containscommontrends;
see Section 4.4.
The presence of stochastic trend components in an econometric model has
interesting implications for conventional dynamic simultaneous equation
models, for the corresponding reduced form models, and for the associated
vector autoregression(VAR) for (y;, x; y. Some of the points can be illustrated
with a simple demand and supply system. Let Ylt denote quantity, YZtprice and
Xt an exogenous variable which is stationary after first differencing, that is
integrated of order one, and write
D:
Ylt
= 'YIYZt + ILl+ EIt
S:
YIt
= 'YzYZt+
,
f3Xt + EZt.
(5.3)
The stochastic trend component ILl may be a proxy for changesin tastes. The
first equation could be approximated using lags of Yl and Yz, but long lags may
be needed and, unless ILl is constant, a unit root is present; compare the
employment-output equation of Harvey et al. (1986). The econometric
reduced form is
Ylt
=
81ILt + '7TIXt+ 6~t ,
Y2t = 82ILt+ '7T2X2
+ 6;t ,
(5.4)
where 81= Yz/(-Yz- YI)' 8z = 1/(yz - YI)' and so on. Thus there is a common
trend. This can be regarded as a reflection of the fact that there is just a single
co-integrating relationship, namely the supply curve; compare a similar, but
simpler, example in Engle and Granger (1987, p. 263). Attempting to estimate
a reduced form with lagged variables but without the stochastic trends runs into
complications; if first differences are taken the stochastic part of the model is
strictly noninvertible, so the approximation is not valid, while in levels any
inference must take account of the unit root; see Sims, Stock and Watson
(1990). The VAR representation of (y;,x;)'
is also subject to constraints
because of the common trend, and although estimation can be carried out
using the method of Johansen (1988), the point remains that long lags may be
Structural time series models
285
required for a satisfactory approximation and so the number of parameters
may be very large for moderate size Nand K.
In summary, models which approximate stochastic trends by lags may be
highly unparsimonious and uninformative about dynamic relationships. If
economic theory does suggest the presence of stochastic trend components,
therefore, there are likely to be considerable gains from estimating the implied
structural relationships directly. If the complete system of equations can be
specified, a full information maximum likelihood (FIML) procedure may be
employed. If only a subsystemis specified, but all the predetermined variables
are named, a limited information maximum likelihood (LIML) procedure is
appropriate. When the rest of the system has not been specified at all, ML
methods cannot be applied, but a valid instrumental variable (IV) estimator
can be obtained.
5.2. Instrumental variable estimation
Suppose the equation of interest is written in matrix notation as
y=ZS+u
(5.5)
where Z is a T x m matrix with observations on explanatory variables and u is
a T x 1 vector of disturbanceswith mean zero and covariance matrix, u;V. The
explanatory variables may include variables which are not exogenous. However, the K exogenous variables in the system provide a set of instrumental
variables contained in a T x K matrix, X.
Multiplying (5.5) through by a T x T matrix L with the property that
L'L = V-I yields
Ly = LZ8 + Lu ,
(5.6)
where Var(Lu) = u;I. If the sametransformationis appliedto X, the matrix of
optimal instruments is formed over a multivariate regression of LZ on LX.
The resulting IV estimator is then
a= (Z'L'PvLZ)-lZ'L'PvLy,
(5.7)
where Pv is the idempotent projection matrix Pv = LX(X'y-lX)-lX'L'.
It is
known as generalized two stage least squares (G2SLS). Under standard
regularity conditions, as in Bowden and Turkington (1984, p. 26), r1/2 a has a
limiting normal distribution. If Y is unknown, but depends on a finite number
of parameters which can be estimated consistently, the asymptotic distribution
is unaffected. When there are no lagged endogenousvariables in (5.5) it can be
shown that the G2SLS estimator is at least as efficient as 2SLS in the sensethat
the determinant of its asymptotic covariance matrix cannot exceed the
determinant of the corresponding expressionfor 2SLS. In a similar way, it can
be shown that G2SLS is more efficient than an IV estimator in which
instruments are formed from X without first transforming by L.
A. C. Harveyand N. Shephard
286
We now considerthe estimationof a modelwhich containsa randomwalk
componentas well as explanatoryvariables,that is
Yt
= ILt+ z;(j + Et, t = 1, . . . , T .
(5.8)
If Zt were exogenous, the GLS estimator of /) could be computed by applying
the Kalman filter appropriate for the stochastic part of the model, ILl + eo to
both Yt and Zt and regressing the innovations from Yt on those from Zt; see
Kohn and Ansley (1985). The same approach may be used with IV estimation.
In the notation of (5.5) the Kalman filter makes the transformations Ly, LZ
and Lx. However, the L matrix is now (T - 1) x T becausethe diffuse prior
for ILl means that only T - 1 innovations can be formed. The variables in (5.8)
may be differenced so as to give a stationary disturbance term. Thus
.1y,=.1z;8+u" t=2,...,T,
(5.9)
where u, = 11,+ ..18,.This equation correspondsmore directly to (5.5) than does
(5.8) since a covariance matrix may be constructed for the disturbance vector
and the associatedL matrix is square. However, postmultiplying this matrix by
the (T-1) x 1 vector of differenced y,'s gives exactly the same result as
postmultiplying the L matrix for (5.8) by the T x 1 vector of y,'s.
A number of estimation procedures for (5.8) are considered in Streibel and
Harvey (1993). In the preferred method, a consistent estimator of 5 is first
obtained by applying a suitable IV estimator to (5.9); if there are no lagged
dependent variables, 2SLS will suffice. Consistent estimators of the hyperparameters are then obtained from the residuals, and these estimators are used
to construct a feasible IV estimator of the form (5.7). There are a number of
ways of estimating the hyperparameters. In simple cases,closed form expressions based on the residual autocorrelations are available but, even with 5
known, such estimators are not efficient. However, what would be the ML
estimator if 5 were known can always be computed by an iterative optimisation
procedure. Given values of the hyperparameters, an IV estimate is constructed
for 5. The hyperparameters are then estimated by ML applied to the residuals.
This procedure is then iterated to convergence. Although iterating will not
change the asymptotic properties of the estimators of 5 or the hyperparameters
when there are no lagged dependent variables, it may yield estimators with
better small sample properties. When this stepwise estimation procedure is
used to estimate an equation in a simultaneous equation system it may be
referred to as G2SLS/ML. All the above procedures can be implemented in
the frequency domain as well as the time domain.
5.3. Maximum likelihood estimation
It is relatively easy to construct the likelihood function for a model of the form
(5.1). Maximising this function then gives the FIML estimators. Of course this
may not be straightforward computationally, and the estimators obtained for
Structural time series models
287
anyone particular equation may be very sensitive to misspecification in other
parts of the system.
If interest centres on a single equation, say the first, and there is not enough
information to specify the remaining equations, a limited information estimation procedure is appropriate. In a model of the form (5.1) where ut is
NID(O, l1), the LIML estimator of the parameters in the first equation can be
obtained by applying FIML to a system consisting of the first (structural)
equation and the reduced form for the endogenousvariables appearing in that
equation. Since the Jacobian of this system is unity, the estimator can be
computed by iterating a feasible SURE estimator to convergence; see Pagan
(1979).
Now consider the application of LIML in a Gaussian system with stochastic
trends generated by a multivariate random walk. It will also be assumedthat
the system contains no lags, although the presence of lags in either the
endogenous or exogenous variables does not alter the form of the estimator.
Thus
rYt=ILt+BXt+Et,
Var(Et)=!.,
(5.10)
with r being positive definite and ILtgiven by (5.1). Hence the reduced form is
Yt = IL: +
llXt
+ E:,
IL: = IL:-l + 71:,
Var(E:) =!:
= r-1!.(r-1)'
Var(T/:) = 1:; = r-1!.,(r-1)'
,
,
(5.11a)
(5.11b)
where YZtis g x 1, xlt is k x 1, and both Elt and l1lt may be correlated with the
corresponding disturbances in the other structural equations. Prior knowledge
suggeststhe presence of a stochastic trend in (5.10). There is no information
on whether or not stochastic trends are present in the other structural
equations in the system, and so they are included for generality. The reduced
form for the endogenousvariables included in (5.10) may be written as
YZt
= fJ-;t + llZXt + 8;t ,
* = fJ-Z,t-l
*
*
fJ-zt
+'1/Zt'
(5.13a)
(5,13b)
The LIML estimator is obtained by treating (5.12) and (5.13) as though they
were the structural form of a system and applying FIML. The Jacobian is unity
and estimation proceeds by making use of the multivariate version of the GLS
algorithm described in Harvey (1989, p. 133).
Streibel and Harvey (1993) derive the asymptotic distribution of the LIML
A. C. Harveyand N. Shephard
288
estimator and compare the asymptotic covariance matrix of the estimators of f3
and 'Y with the corresponding matrix from the G2SLS/ML estimation procedure for a model without lagged endogenousvariables. If 4'1 = q4e in (5.10),
where q is a scalar, the multivariate part of the model is homogenous; see
Section 4. In this case G2SLS/ML is as efficient as LIML. Indeed efficiency is
achieved with G2SLS without iterating, provided an initial consistent estimator
of q is used.
Although G2SLS/ML is not, in general, asymptotically efficient as compared
with LIML, the Monte Carlo experiments reported in Streibel and Harvey
suggest that in small samples its performance is usually better than that of
LIML. Since it is much simpler than LIML, it is the recommended estimator.
6. Nonlinear and non-Gaussianmodels
Relaxing the requirement that time seriesmodels be linear and Gaussianopens
up a vast range of possibilities. This section introduces the work in this field
which exploits the structural time series framework. It starts with a discussion
of conditionally Gaussiannonlinear state spacemodels and then progressesto
derive a filter for dynamic generalised linear models. Some recent work on
exact filters for nonlinear, non-Gaussianstate spacemodels is outlined. Finally,
some structural approachesto modelling changing variance is discussed.
6.1. Conditionally Gaussianstate space models
The state space form and the Kalman filter provides such a strong foundation
for the manipulation of linear models that it is very natural to try to extend
their use to deal with nonlinear time series. Some progress can be made by
defining a conditionally Gaussian state space model
y, = Z,(Yt-l)a, + X,!3+ 6, ,
a, = T,(Yt-l)at-l
+ w,!3 + T'J,,
-
6, N(O, H,(Yt-l» ,
T'J,- N(O,
Q,(Y,-t».
(6.1)
Here 8/ and TJsare assumedto be independent for all values of t and s. In this
model the matrices in the state spacemodel are allowed to depend on Y/-P the
available information at time t - 1. The Kalman filter still goes through in this
case and so the likelihood for the model can be built up from the prediction
error decomposition.
The theory behind this type of modelling framework has been studied at
considerable length in Liptser and Shirayev (1978). The following examples
illustrate its flexibility.
EXAMPLE.The coefficient of a first-order autoregression can be allowed to
Structural time series models
289
follow a random walk, as Yt-l is in Yt-l' Thus
Yt = Yt-lUH
+ Et ,
Ut=Ut-l+T'/t'
Et
- NID(~, 0';) ,
(6.2)
T'/t-NID(O,O'.,).
EXAMPLE.Some macro-economic time series appear to exhibit cycles in which
the downswing is shorter than the upswing. A simple way of capturing such a
feature is to specify a cyclical component which switchesfrom one frequency to
another as it moves from downswing into upswing and vice versa. This could be
achieved by setting
A = { Al' I.f 1/1'1,-1
- 1/1'-1>0,
A
c
A2,
A
If. 1/1'11-1
- 1/11-1:s;;0,
A
A
A 1 :S;;A2'
(6.3)
where 1bt!1-1 and Ibl-1 are estimates of the state of the cycle at times t and t - 1
respectively, made at time t
-
1. This model, which belongs within the class of
threshold models described in Tong (1990), in effect fits two separate linear
cycle models to the date, the division taking place and Ibt!t-l -lbl-1 switches
sign.
6.2. Extended Kalman filter
For ease of exposition supposeYt and
Yt = Zt(aJ + Et ,
at
= Tt(at-l)
+ TJt,
Et
-
at
are univariate and
NID(O, u;(at» ,
(6.4)
-
TJt NID(O,u~(at-l» .
This model cannot be handled exactly by using the Kalman filter. However, for
some functions it is possible to expand Zt(ut) and Tt(Ut-l) using a Taylor series
to give
(6.5)
If, in addition, the dependenceof the variances on the states is dealt with by
replacing them by estimates, made at time t -1, then the new approximate
model becomes
Et
-
N(O, O"~(atlt-l» ,
-
TIt N(O, u~(at-l)) .
(6.6)
290
A. C. Harvey and N. Shephard
This model is then in the conditionally Gaussianframework and so the Kalman
filter can be used to estimate the state. Since the model itself is an approximation, we call the conditionally Gaussian Kalman filter an extended Kalman
filter for the original model (6.4); see Anderson and Moore (1979, Chapter 8).
EXAMPLE.Suppose the logistic transformation is being used to keep z/(u/)
between zero and one as in (4.16). Then
(6.7)
Then the expanded model becomes
(6.8)
This idea can be used to construct a model of opinion polls. Supposethere are
just two parties. If the level of support for one party is modelled as a logistic
transformation of a Gaussian random walk and the measurement error
originates from using a simple random sample size np then
Y,- IL,+ E"
- N(O
E,
,0'"
2)
2
0',
=
IL,(l
-
IL,)
,
(6.9a)
n,
(6.9b)
al
= al-l + TlI,
-
TlI NID(O, (T~) .
(6.9c)
As ILl is unknown, this model cannot be analysed by using the Kalman filter.
Instead, an estimate of al can be made at time t - 1, written aliI-I> and it can
be used to replace ILl in the variance. One of the problems with this approach is
that this model does not constrain the observations to lie between zero and
one, as el is assumedGaussian.Although this could be a problem if ILl were to
be close to zero or one, this is unlikely to pose a difficulty for moderate sample
sizes.
The Kalman filter can be applied in the standard way once the logistic
transformation has been Taylor expanded. The resulting model is
= mt-l
Et
+ exp(-Ot-l)m;-l(Ut
- N (0, mt-l(ln~
mt-l)),
- °t-l) + Et'
Structural time series models
291
allow for irregular observations, was followed by Shephard and Harvey (1989)
in their analysis of opinion poll data from the British general election
campaigns of October 1974, 1979, 1983 and 1987.
6.3. Non-Gaussian state space models
Although the Gaussian state spaceform provides the basis for the analysis of
many time series, it is sometimesnot possible to adequately model the data, or
a transformation of it, in this way. Some series, such as count data, are
intrinsically non-Gaussian and so using a Gaussianmodel could harm forecasting precision. In this section we outline the methods for directly modelling
non-Gaussian series.
The key to modelling non-Gaussian time series.is the non-Gaussian state
space form. It will be built out of two assumptions. Firstly the measurement
equation is such that we can write
(6.11)
This assumesthat given the state at, the observation Yt is independent of all the
other states and observations, Thus at is sufficient for Yt' The second
assumption is that the transition equation is such that
T
[(ap
, , ,
, aT Iyo) = [(all yo) IT [(at I at-I) ,
t=2
(6.12)
that is the state follows a Markov process.
Filtering can be derived for a continuous state by the integrals
[(at
IYt-l) = J [(at Iat-l)[(at-l
IYt-l)
dat-l ,
(6.13a)
(6.13b)
Thus it is technically possible to carry out filtering, and indeed smoothing, for
any state space model if the integrals can be computed. Kitagawa (1987) and
Pole and West (1990) have suggested using particular sets of numerical
integration rules to evaluate these densities. The main drawback with this
general approach is the computational requirement, especially if parameter
estimation is required. This is considerableif a reasonabledegree of accuracyis
to be achieved and the dimension of the state is larger; the dimension of the
integral will equal the dimension of the state and so will be 13 for a basic
structural model for monthly data. It is well known from the numerical analysis
literature that the use of numerical integration rules to evaluate high-dimensional integrals is fraught with difficulty.
The computational explosion associated with the use of these numerical
A. C. Harveyand N. Shephard
292
integration rules has prompted research into alternative methods for dealing
with non-Gaussian state space models. Recent work by West and Harrison
(1989) and West, Harrison and Migon (1985) has attempted to extend the use
of the Kalman filter to cover caseswhere the measurementdensity is a member
of the exponential family, which includes the binomial, Poisson and gamma
densities, while maintaining the Gaussiantransition density. As such this tries
to extend the generalised linear model, described in McCullagh and NeIder
(1989), to allow for dynamic behaviour.
For ease of exposition we will only look at the extension of the local level
modelto coverthe exponentialfamily measurement
density More specifically,
0
we will assumethat
(
FYI
1
)
2)
(
ILl =b YI,UIE exp
1
1
/(IL,
ILt-l)
(
= Y27fu~exp
)
YIILI-a(ILI»
2
,
UEI
(ILl - ILt-I)2
(
2u~
(6.14)
)
and follow the developmentgivenin Westand Harrison (1989).Here U;I will
be assumedto be known at time t. By selecting a(o) and b(.) appropriately, a
large number of distributions can result. A simple example of this is the
binomial distribution
(
n!I
)-
Y'( l
-1,
J'YI7TI-YI!(nl-Yt)!7T1
1
- 7Tt)n,-y, ,
(6.15)
which is obtained by writing
ILl
= log 1 -7Tt7T '
a(IL,)= 10g(1+ exp(IL,»,
I
2
nt!
b(y" UEt)=
(6.16)
,( Yr'
nl
)
"
YI'
Although it is relatively straightforward to place densities into their exponential form, the difficulty comes from filtering the unobservable component ILl as
it progressesthrough time. Supposewe have a distribution for ILl-II Yl-l' The
first two moments of this prior will be written as mt-l and PI-I' The random
walk transition means that the first two moments of ILl YI-l will be
1
mlll-l
= ml-1 ,
PIII-l
=PI-l + u; .
(6.17)
As the measurement density is in the exponential family, it is always possible
to find a conjugate density. Generically it takes the form
/(IL,I Yt-l) = c(rtlt-l' Stlt-l)exp(lLlrtll-1- s'lt-la(IL,» .
(6.18)
For a particular form of this density it is typically possible to select rlll-l and
S/I/-l so that the first two moments of this density match m/ll-l and PIlI-I' Thus
the actual prior density of IL, Y,-t will be approximated by a density which has
1
293
Structural time series models
-
2
c(r,I,-1
2
.
+ y,lU e" s,I,-1 + (l/u e,»
Further
felL,IY,) = c(rp Sf)exp(rtlLt- Sta(lLt» ,
(6.20)
where
(6.21)
By finding the first two moments of this density, implied values for mt and Pt
can be deduced, so starting the cycle off again. As the approximate density of
Yt I Yt-l is known analytically, a maximum quasi-likelihood procedure can be
used to estimate the unknown parameters of this model by using a predictive
distribution decomposition of the joint density of the observations
(6.22)
(6.23)
so
(6.24)
(6.25)
2
U ePtlt-l
Pt
Ptlt-l
+Ue 2 ,
this is the usualKalman filter.
(6.26)
A.. C. Harvey and N. Shephard
294
EXAMPLE.If the measurement density is binomial then the conjugate prior is
beta,
~(
J"Tr
,
IY
,-I
) = r(r'I'-1 + S'I'-I)
But as IL,= log 'Tr,/1m,ll-l
r(r'I'-I)r(S'I'-I)
'Tr,
r
'Trtl-II
,
-1 (1
-'Tr
)' tI-I
"
I -I
( 6.27)
it follows that using our prior knowledge of IL"
= ElL, IY'-1 = 1'(r'll-l)
I
P'It-1 =Var IL, Y'-1 =
-1'(S'I'-I) ,
1(r'I,-I) + 1(s'I'-I)'
(6.28)
where '}'(.) is the digamma function and i{) is its derivative, we can allow rlll-l
and 8t!I-l to be selected numerically. When rlll-l and 8111-1
are updated to give
rl and 8" the corresponding ml and PI can be deduced from
mt
= y(rJ - Y(St),
(6.29)
Pt = 1'(rt) + 1'(St) .
This completes the cycle, since mt+llt = mt and Pt+llt = Pt + u~.
The work on the dynamic generalisedlinear model and the extended Kalman
filter share some important characteristics.The most important of these is that
both are approximations, where the degree of approximation is difficult to
determine. In neither case does the filtered estimate of the state possessthe
important property that it is the minimum mean square error estimate.
An alternative approach is to design transition equations which are conjugate to the measurementdensity so that there exists an exact analytic filter. In
the last five years there has been some important work carried out on these
exact non-Gaussianfilters. Most of this work has been basedon a gamma-beta
transition equation; see the discussionin Lewis, McKenzie and Hugus (1989).
A simple example is
(6.32)
Structural time series models
295
Thus the expectation of the level remains the same, but its variance increases
just as it does in a Gaussian local model.
Gamma-beta transition equations have been used by Smith and Miller
(1986) in their analysis of extreme value time series to enable them to forecast
athletic world records. Harvey and Fernandes (1989a) exploited them to study
the goals scored by the England football team, against Scotland in their
matches at Hampden Park. A more interesting example from an economic
viewpoint is the paper by Harvey and Fernandes (1989b) on insurance claims.
Both papers use a Poisson measurement equation
!(Ytlat)=e-ata:t
(6.33)
Yt!
As a gamma is the conjugate prior to a Poisson distribution, this model is
closed by using a gamma-beta transition equation, for the use of Bayes'
theorem shows that
(6.34)
This means that if ao = bo = 0, the filtered estimateof at is
t-l
L wiYt~i
tEat I y t - -abtt -- i=O
L 1 wi
(6.35)
i=O
which is an exponentially weighted moving average of the observations. The
one-step ahead predictive distribution is
f(Yt I Yt- 1) =
I
I
f(Yt uJf(ut Yt-l) dut
(6.36)
which is negative binomial and so the likelihood for this model can be
computed using the predictive distribution decomposition, as in (6.22).
6.4. Stochastic variance models
One of the most important modelling techniques to emerge in the 1980swas
autoregressive conditional heteroskedasticity (ARCH); see En~;le (1982) and
Bollerslev (1986). These authors suggestedmodelling the variability of a series
by using weights of the squares of the past observations. The important
A. C. Harveyand N. Shephard
296
(6.37)
that is the one step ahead predictive distribution depends on the variable hr,
Thus the conditional variance of the process is modelled directly, just like in
ARMA models the conditional mean is modelled directly.
Although the development of these models has had a strong influence in the
econometric literature, a rather different modelling approach has been suggested in the finance literature; see, for example, Hull and White (1987),
Chesney and Scott (1989) and Melino and Turnbull (1990). These papers have
been motivated by the desire to allow time varying volatility in opinion pricing
models, so producing a more dynamic Black-Scholes type pricing equation.
This requires that the volatility models be written down in terms of continuous
time Brownian motion. In general ARCH models do not tie in with such a
formulation, although as Nelson (1991) shows there are links with EGARCH.
The finance models, which are usually called stochastic volatility models,
although we prefer to call them stochastic variance models, have some very
appealing properties. They directly model the variability of the series, rather
than the conditional variability. Thus they are analogous to the structural
models discussedin the rest of this paper which are all direct models for the
mean of the series at a particular point in time. A simple example is
Y/
= £/ exp(h,l2) ,
h, = Uo + alht-l
+ 71/ ,
£/
-
NID(O, 1) ,
71/
-
NID(O, O'~) .
(6.38)
where, for simplicity, s, and l1sare assumedto be independent for all t and s.
Here the logarithm of the standard deviation of the series follows an AR(l)
process, which has an obvious continuous time generalisation. It is not
observable, but it can be estimated from the linear state space form
log y; = h, + log e; = h, + e;
h, = llo + alh'-l +TI"
(6.39)
where e; is independentan identicallydistributed,but not Gaussian.In fact
Ee; =.=-1.27 and Var e; = 4.93; seeAbramowitz and Stegun(1970,p. 943).
The Kalman filter provides the minimum mean square linear estimator of hi
from the log y; series. Further, the corresponding smoother inherits the same
property of being the best linear estimator given the whole data set.
As YI is the product of two strictly stationary processes, it must also be
strictly stationary. Thus for any stochastic variance model, the restrictions
needed to ensure the stationarity of YI are just the standard restrictions to
ensure the stationarity of the process generating hi; compare the simplicity of
this to the GARCH(1, 1) model, as analysedby Nelson (1990). The properties
Structural time series models
297
of this particular autoregressivestochasticvariance model can be worked out if
lall < 1, for then ht must be strictly stationary, with
'Yh=Eh=
t
ao
1-a.
"
(6.40)
1
The fact that Yt is white noisefollows almost immediately given the independence of Et and 11,.The mean is clearly zero, while
( (ht +2hI-,,))= 0
EYtYt-"= EEtEt-"E exp
(6.41)
asE£'£'-T = O.The odd momentsof y, are all zerobecause£,is symmetric.The
even moments can be obtained by making use of a standard result for the
lognormal distribution, which in the present context tells us that since exp(h,)
is lognormal, its j-th moment about the origin is exp(iYh + ju~/2). Therefore
VaT Yt = Ee;E exp(ht)= exp('Yh
+ u~/2) .
(6.42)
The fourth moment is
Ey:
= EE:E exp(hti = 3 exp(2Yh+ 2u~)
(6.43)
and so the kurtosis is 3 exp(u~), which is greater than 3 when u~ is positive.
nus the model exhibits excesskurtosis compared with the normal distribution.
The dynamic properties of the model appear in log y; rather than y;. In (6.39)
h, is an AR(1) processand e: is white noise so log y; is an ARMA(1,1)
process and its autocorrelation function is easy to derive.
The parameter estimation of stochastic variance models is also reasonably
simple. Although the linear state space representation of log y; allows the
computation of the innovations and their associatedvariances, the innovations
are not actually Gaussian. If this fact is ignored for a moment and the
'Gaussian' likelihood is constructed, then this objective function is called a
quasi-likelihood. A valid asymptotic theory is available for the estimator which
results from maximising this function; see Dunsmuir (1979, p. 502).
The model can be generalised so that h, follows any stationary ARMA
process, in which case y, is also stationary and its properties can be deduced
from the properties of h,. Other components could also be brought into the
model. For example, the variance could be related to a changing daily or intra
daily pattern.
Multivariate generalisations of the stochastic volatility models have been
suggested by Harvey, Ruiz and Shephard (1991). These models overcome
many of the difficulties associatedwith multivariate ARCH based models; see
Bollerslev, Chou and Kroner (1992) for a survey of these models. The basic
~
298
A. C. Harvey and N. Shephard
idea is to let the ith element of the N-dimensional vector Yt be
Yit = Eit exp(hit)
,
(6.44)
hit = Uoi+ O!lihit-l +T1it'
where c, and TJ,are N-dimensional multivariate Gaussianwhite noise processes
with covariances !. and !Tj' The matrix !. will be constrained to have ones
down its leading diagonal and so can be thought of as being a correlation
matrix,
The model can be put into state space form, as in (6.39), by writing
log Y~I= hit
+ e~, i = 1, . . . , N .
(6.45)
The covarianceof 6; = (6;1' . . . , 6~t)' can be analytically related to 4£, so
allowing straightforward estimation of 4£ and 4T/ by using a quasi-likelihood,
although the signs of the elementsof 4£ cannot be identified using this
procedure. However, these signs can be estimated directly from the data, for
YitYjt> 0 if and only if 6it6jt> 0 implying the sign of the i,j-th element of 4£
should be estimated to be positive if the number of occurrencesof YitYjt> 0 is
greater than T/2.
Harvey, Ruiz and Shephard (1991) analysefour daily exchangerates for the
US dollar using (6.38) and find that Uj is approximately equal to unity for all
the rates, suggestingthat a random walk is appropriate for h,. This model has
very similar properties to IGARCH in which
Uj
+ Uz = 1 in (6.37). The
multivariate generalisation is straightforward and the transformed observations, as in (6.45), are a SUTSE model of the form (4.1). Further investigation
of the model indicates that it can be made even more parsimonious by
specifying just two common trends, thereby implying co-integration in volatility; compare (4.14). The first common trend affects all four exchange rates,
while the second is associatedprimarily with the Yen.
Although stochastic variance models can be made to fit within the linear
space framework and so can be handled by using the Kalman filter, this filter
does not deliver the optimal (minimum mean square error) estimate. It is not
possible to derive the optimal filter analytically and so it is tempting to change
the transition equation in an attempt to allow the derivation of exact results for
this problem. This approach has been followed by Shephard (1993b) using the
techniques discussed in the previous subsection. He proposed a local scale
model
Ytlat-N(O,a;I),
(6.46)
where at, the precision of the series at time t, satisfies the gamma-beta
transition equation of (6.30). Although at is unknown, it can be estimated
because
a,IY,-G(a"b,),
a,=a'I'-l+!'
b,=btl'-l+!Y;
(6.47)
';
299
Structural time series models
and also
(6.48)
this being the inverse of the EWMA of the squares of the observations.
When the focus shifts to the one-step ahead forecast density, then
Ytl Yt-
(6.49)
that is Yt I Yt-l is a scaled Student's t variable, with scale which is an exact
EWMA of the squaresof the past observations. If t is large then the degreesof
freedom in the predictive density will approximately equal w/(1- w). As
w ~ 1, the degreesof freedom increaseand so the one-stepaheaddensity
becomeslike a normal. The parameterW has to be larger than 0.8 for the
fourth momentto exist. SettingW to 0.5 meansthat the densityis a Cauchy
random variable.
Many extensions of this model are possible, allowing, amongst other things,
an exponential power measurement density instead of normal, irregularly
spaced observations and multistep ahead forecasts. The difficulty with the
model is that it is hard to significantly depart from the gamma-beta transition
equation. As this is constrained to be a nonstationary processand is technically
awkward to generalise to the multivariate case, it is of less practical use than
the stochasticvariance models. However, for dealing with this very special case
it does provide a rather interesting alternative.
References
Abramowitz, M. and I. A. Stegun (1970). Handbook of Mathematical Functions. Dover, New
York.
Anderson, B. D. O. and J. B. Moore (1979). Optimal Filtering. Prentice-Hall, Englewood Cliffs,
NJ.
Anderson, T. W. and C. Hsiao (1982). Formulation and estimation of dynamic models using panel
data. J. Econometrics 18, 47-82.
Ansley, C. F. and R. Kohn (1985). Estimation, filtering and smoothing in state spacemodels with
incompletely specified initial conditions. Ann. Statist. 13, 1286-1316.
Ansley, C. F. and R. Kohn (1989). Prediction mean square error for state space models with
estimated parameters. Biometrika 73, 467-474.
Binder, D. A. and J. P. Dick (1989). Modelling and estimation for repeated surveys. Survey
Method. 15, 29-45.
Bollerslev, T. (1986). Generalized autoregressiveconditional heteroskedasticity. J. Econometrics
31, 307-327.
Bollerslev, T., R. Y. Chou and K. F. Kroner (1992). ARCH models in finance: A review of the
theory and empirical evidence. J. Econometrics 52, 5-59.
300
A. C. Harvey and N. Shephard
Bowden, R. J. and D. A. Turkington (1984). Instrumental Variables. Cambridge Univ. Press,
Cambridge.
Box, G. E. P. and G. M. Jenkins (1976). Time SeriesAnalysis: Forecastingand Control. Revised
edition, Holden-Day, San Francisco, CA.
Box, G. E. P. and G. C. Tiao (1975). Intervention analysis with applications to economic and
environmental problems. J. Amer. Statist. Assoc. 70,70-79.
Chesney, M. and L. O. Scott (1989). Pricing European currency options: A comparison of the
modified Black-Scholes model and a random variance model. J. Finane. Quant. Anal. 24,
267-284.
De Jong, P. (1988a). The likelihood for a state space model. Biometrika 75, 165-169.
De Jong, P. (1988b). A cross-validation filter for time series models. Biometrika 75, 594-600.
De Jong, P. (1989). Smoothing and interpolation with the state space model. J. Amer. Statist.
Assoc. 84, 1085-1088.
De Jong, P. (1991). The diffuse Kalman filter. Ann. Statist. 19, 1073-1083.
Duncan, D. B. and S. D. Horn (1972). Linear dynamic regressionfrom the viewpoint of regression
analysis, J. Amer. Statist. Assoc. 67, 815-821.
Dunsmuir, W. (1979). A central limit theorem for parameter estimation in stationary vector time
series and its applications to models for a signal observed with noise. Ann. Statist. 7, 490-506.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticitywith estimates of the variance
of UK inflation. Econometrica 50, 987-1007.
Engle, R. F. and C. W. J. Granger (1987). Co-integration and error correction: Representation,
estimation and testing. Econometrica 55, 251-276.
Fernandez, F. J. (1990). Estimation and testing of a multivariate exponential smoothing model. J.
Time Ser. Anal. 11, 89-105.
Fernandez, F. J. and A. C. Harvey (1990). Seemingly unrelated time series equations and a test
for homogeneity. J. BusinessEcon. Statist. 8, 71-82.
Harrison, P. J. and C. F. Stevens (1976). Bayesian forecasting. J. Roy. Statist. Soc. Ser. B 38,
205-247.
Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter.
Cambridge Univ. Press, Cambridge.
Harvey, A. C. and A. Jaeger (1991). Detrending, stylized facts and the businesscycle. Mimeo,
Department of Statistics, London School of Economics.
Harvey, A. C. and C. Fernandes(1989a). Time series models for count or qualitative observations
(with discussion). J. BusinessEcon. Statist. 7,407-422.
Harvey, A. C. and C. Fernandes(1989b). Time series models for insurance claims. J. Inst. Actuar.
116, 513-528.
Harvey, A. C. and S. J. Koopman (1992). Diagnostic checking of unobserved components time
series models. J. BusinessEcon. Statist. 10, 377-389.
Harvey, A. C. and S. J. Koopman (1993). Short term forecasting of periodic time series using
time-varying splines. J. Amer. Statist. Assoc., to appear.
Harvey, A. C. and P. Marshall (1991). Inter-fuel substitution, technical changeand the demand for
energy in the UK economy. Appl. Econ. 23, 1077-1086.
Harvey, A. C. and S. Peters (1990). Estimation procedures for structural time series models, J.
Forecast. 9, 89-108.
Harvey, A. C., E. Ruiz and N. G. Shephard (1991). Multivariate stochastic variance models.
Mimeo, Department of Statistics, London School of Economics.
Harvey, A. C. and M. Streibel (1991). Stochastic trends in simultaneous equation systems.In: P.
Hackl and A. Westlund, eds., Economic Structural Change, Springer, Berlin, 169-178.
Harvey, A. c., B. Henry, S. Peters and S. Wren-Lewis (1986). Stochastic trends in dynamic
regression models: An application to the output-employment equation. Econ. J. 96, 975-985.
Harvey, A. C. and J. Durbin (1986). The effects of seat belt legislation on British road casualties:
A case study in structural time series modelling. J. Roy. Statist. Soc. Ser. A 149, 187-227.
Hausman, J. A. and M. W. Watson (1985). Errors in variables and seasonal adjustment
procedures. J. Amer. Statist. Assoc. 80, 541-552.
I
<
0
I
.
~
c
e
I
S
.
Structural time series models
301
Hendry, D. F. and J.-F. Richard (1983). The econometric analysis of economic time series.
Internat. Statist. Rev. 51, 111-164.
Hull, J. and A. White (1987). Hedging the risks from writing foreign currency options. J. Internat.
Money Finance 6, 131-152.
Johansen, S. (1988). Statistical analysis of cointegration vectors. J. Econ. Dynamics Contro/12,
231-254.
Jones, R. H. (1980). Best linear unbiased estimators for repeated survey. J. Roy. Statist. Soc. Ser.
,.
B 42, 221-226.
Jorgenson, D. W. (1986). Econometric methods for modelling producer behaviour. In: Z. Giliches
and M. D. Intriligator, eds. Handbook of Econometrics,Vol. 3, North-Holland, Amsterdam,
1841-1915.
Kitagawa, G. (1987). Non-Gaussian state-spacemodeling of nonstationary time series. J. Amer.
Statist. Assoc. 82, 1032-1041.
Kohn, R. and C. F. Ansley (1985). Efficient estimation and prediction in time series regression
models. Biometrika 72, 694-697.
Kohn, R. and C. F. Ansley (1989). A fast algorithm for signal extraction, influence and
cross-validation in state space models. Biometrika 76, 65-79.
Koopman, S. J. (1993). Disturbance smoother for state space models. Biometrika SO,to appear.
Koopman, S. J. and N. Shephard (1992). Exact score for time series models in state space form.
Biometrika 79, 823-826.
Lewis, P. A. W., E. McKenzie and D. K. Hugus (1989). Gamma processes, Comm. Statist.
Stochast. Models 5, 1-30.
Liptser, R. S. and A. N. Shiryayev (1978). Statistics of Random ProcessesII: Applications.
Springer, New York.
Marshall, P. (1992a). State space models with diffuse initial conditions. J. Time Ser. Anal. 13,
411-414.
Marshall, P. (1992b). Estimating time-dependent means in dynamic models for cross-sectionsof
time series. Empirical Econ. 17, 25-33.
McCullagh, P. and J. A. Neider (1989). Generalized Linear Models. 2nd ed., Chapman and Hall,
London.
Melino, A. and S. M. Turnbull (1990). Pricing options with stochastic volatility. J. Econometrics
45, 239-265.
Nelson, D. B. (1990). Stationarity and persistence in the GARCH(I,I)
model. Econometric
Theory 6, 318-334.
Nelson, D. B. (1991). ARCH models as diffusion approximations. J. Econometrics 45, 7-38.
Pagan, A. R. (1979). Some consequencesof viewing LIML as an iterated Aitken estimator. Econ.
Lett. 3, 369-372.
Pfeffermann, D. (1991). Estimation and seasonaladjustment of population meansusing data from
repeated surveys (with discussion). J. BusinessEcon. Statist. 9, 163-177.
Pfeffermann, D. and L. Burck (1990). Robust small area estimation combining time series and
cross-sectionaldata. Survey Method. 16, 217-338.
Pole, A. and M. West (1990). Efficient Bayesian learning in non-linear dynamic models. J.
Forecasting 9, 119-136.
Rao, C. R. (1973). Linear Statistical Inference. 2nd ed., Wiley, New York.
Rosenberg, B. (1973). Random coefficient models: The analysisof a cross section of time series by
stochastically convergent parameter regression. Ann. Econ. Social Measurement2, 399-428.
Scott, A. J. and T. M. F. Smith (1974). Analysis of repeated surveys using time series models. J.
Amer. Statist. Assoc. 69, 674-678.
Shephard, N. (1993a). Maximum likelihood estimation of regression models with stochastic trend
components. J. Amer. Statist. Assoc. 88, 590-595.
Shephard, N. (1993b). Local scale model: State space alternatives to integrated GARCH
processes.J. Econometrics, to appear.
Shephard, N. and A. C. Harvey (1989). Tracking to the level of party support during general
election campaigns. Mimeo, Department of Statistics, London School of Economics.
302
A. C. Harvey and N. Shephard
Shephard, N. and A. C. Harvey (1990). On the probability of estimating a deterministic
component in the local level model. J. Time Ser. Anal. 11,339-347.
Sims, C. A., J. H. Stock and M. W. Watson (1990). Inference in linear time series models with
some unit roots. Econometrica 58, 113-144.
Slade, M. E. (1989). Modelling stochastic and cyclical components of structural change: An
application of the Kalman filter. J. Econometrics 41, 363-383.
Smith, R. L. and J. E. Miller (1986). A non-Gaussian state space model and application to
prediction of records. J. Roy. Statist. Soc. Ser. B 48, 79-88.
Smith, T. M. F. (1978). Principles and problems in the analysis of repeated surveys. In: N. K.
Nawboodivi, ed., Survey Sampling and Measurement,Academic Press, New York, 201-216.
Stock, J. H. and M. Watson (1988). Testing for common trends. J. Amer. Statist. Assoc. 83,
1097-1107.
Stock, J. H. and M. Watson (1990). A probability model of the coincident economic indicators. In:
K. Lahiri and G. H. Moore, eds., Leading Economic Indicators, Cambridge Univ. Press,
Cambridge, 63-89.
Streibel, M. and A. C. Harvey (1993). Estimation of simultaneousequation models with stochastic
trend components. J. Econ. Dynamics Control 17, 263-288.
Tam, S. M. (1987). Analysis of repeated surveys using a dynamic linear model. Internat. Statist.
Rev. 55, 63-73.
Tiao, G. C. and R. S. Tsay (1989). Model specification in multivariate time series (with
discussion). J. Roy. Statist. Soc. Ser. B 51, 157-214.
Tong, H. (1990). Non-Linear Time Series: A Dynamic System Approach. Clarendon Press,
Oxford.
Tunnicliffe-Wilson, G. (1989). On the use of marginal likelihood in time series estimation. J. Roy.
Statist. Soc. Ser. B 51, 15-27.
West, M., P. J. Harrison and H. S. Migon (1985). Dynamic generalized linear models and
Bayesian forecasting (with discussion). J. Amer. Statist. Assoc. SO,73-97.
West, M. and P. J. Harrison (1989). Bayesian Forecastingand Dynamic Models. Springer, New
York.
Whittle, P. (1991). Likelihood and cost as path integrals. J. Roy. Statist. Soc. SeT.B 53, 505-538.
-.
,
-",",
~A
. .