73
School of Computing, Engineering and Mathematics Can GDP Be Forecasted Using Statistical Models Natalie Fuller May 2014

Natalie Fuller Thesis

Embed Size (px)

Citation preview

Page 1: Natalie Fuller Thesis

School of Computing, Engineering and Mathematics

Can GDP Be Forecasted Using Statistical Models

Natalie Fuller

May 2014

Page 2: Natalie Fuller Thesis

Declaration

I declare that no part of the work in this report has been submitted in support of an application

for another degree or qualification at this or any other institute of learning.

Natalie Fuller

i

Page 3: Natalie Fuller Thesis

Acknowledgements

I would like to acknowledge my supervisor Alison Bruce for her help and encourangement with

this project.

ii

Page 4: Natalie Fuller Thesis

Abstract

This project compares alternative types of statistical modelling techniques to model and forecast

the rate of change of GDP in the UK. Statistical modelling has been carried out using publicly

available data measured quarterly from quarter 1 - 1970 through to quarter 3 - 2013. Within this

project the Box-Jenkins ARIMA, ARCH and GARCH modelling techniques are compared, and

the optimal models for each technique are decided. The Box-Jenkins ARIMA model is used to

forecast for GDP itself, whereas ARCH and GARCH models are employed to forecast the variance

in the series. Inflation is added as an explanatory variable to the modelling technique with the

best fit to the data thus creating a bivariate model. Forecasts are calculated for each modelling

technique, and the forecasting technique with the highest predictive performance is found to be

the bivariate AR(1)/GARCH(1,1) model.

Supervisor: Alison Bruce

iii

Page 5: Natalie Fuller Thesis

Contents

1 Introduction 1

1.1 General objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Specific objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Notation and abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Review 4

3 Data Overview and Analysis 6

3.1 GDP - The expenditure approach . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Timeplot analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Univariate Box-Jenkins Modelling 9

4.1 ARIMA (p,d,q) modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 The ARIMA modelling process . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2.1 Check for non-stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . 12

iv

Page 6: Natalie Fuller Thesis

4.2.2 Model identification and selection . . . . . . . . . . . . . . . . . . . . . 14

4.2.3 Parameter testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2.4 Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2.5 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Forecasting GDP by an ARMA(1,1) model . . . . . . . . . . . . . . . . . . . . . 20

5 Univariate GARCH Modelling 23

5.1 GARCH(p,q) modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1.1 Check for heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.1.2 Parameter testing and model identification . . . . . . . . . . . . . . . . . 26

5.1.3 Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2 AR(P)/GARCH(p,q) modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.1 Choosing the order of the autoregressive term . . . . . . . . . . . . . . . 32

5.2.2 Parameter testing and model identification . . . . . . . . . . . . . . . . . 37

5.2.3 Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Forecasting GDP by a univariate GARCH model . . . . . . . . . . . . . . . . . . 39

6 Multivariate GARCH Modelling 42

6.1 Inflation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

v

Page 7: Natalie Fuller Thesis

6.2 Bivariate GARCH modelling Process . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2.1 Parameter testing and model identification . . . . . . . . . . . . . . . . . 45

6.2.2 Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Forecasting GDP by a bivariate GARCH model . . . . . . . . . . . . . . . . . . 49

7 Results 51

7.1 Optimal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7.2 Bivariate AR(1)/GARCH(1,1) model analysis . . . . . . . . . . . . . . . . . . . 52

8 Conclusion 55

8.1 Suggestions to improve upon this investigation . . . . . . . . . . . . . . . . . . . 56

Bibliography 60

Appendix 61

A.1 UK Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.2 MSE Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.3 Parameter P values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A.4 SAS code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vi

Page 8: Natalie Fuller Thesis

Chapter 1

Introduction

Throughout recent years the UK economy has been through a tough time, experiencing World

Wars, political upheavals, and banking crises. During the 1970’s the economy was suffering due

to political malice[1], but also thriving due to the Bank of England reducing the regulations on

mortgages[2]. Now we are in the early 21st century the status of the UK economy is rising. A

banking crisis hit the UK in 2008[3], and the stability of the economy has been improving since

then.

The overall status of the economy in the UK is measured by a quarterly figure called gross domestic

product (GDP). GDP figures are published approximately a month after the banking quarter end

causing a one month lag. This lag in data retrieval means that there is uncertainty about how

the economy is performing at this present moment, or where it could potentially be in the future.

There is a great demand for forecasts as when making important economic decisions, it is helpful

to know the current state of the economy.

NIESR (National Institute of Economic and Social Research) provide economic forecasts by using

”expertise in both quantitative and qualitative methods” [4]. Forecasting methods used by such

companies are undisclosed, therefore a review of the subject will be carried out to examine the

methods of other statisticians when forecasting GDP globally.

This project will focus on forecasting GDP using statistical models. A time series is a collection

of data points that have been measured sequentially throughout time. GDP can be described as

1

Page 9: Natalie Fuller Thesis

CHAPTER 1. INTRODUCTION 2

time series data as it has been measured throughout history at quarterly time points. A large

number of time series statistical models are available, and these can be used to calculate forecasts.

In this investigation a collection of univariate (a modelling process involving one variable) and

bivariate (a modelling process involving two variables), Box-Jenkins and heteroscedastic mod-

elling techniques are employed. These are techniques that have been widely used by statisticians.

The Box-Jenkins approach carried out is the autoregressive integrated moving average (ARIMA)

modelling process. The ARIMA model combines autoregressive (AR) and moving average (MA)

models to forecast the value of GDP in the future. The ARIMA model assumes that the variance

is constant over time, though it is suggested that for this data set the variance could be sporadic.

Consequently, the autoregressive conditional heteroscedasticity (ARCH) model and its extension

of the generalised autoregressive conditional heteroscedasticity (GARCH) model are used to ac-

commodate for the changes in variance. The variance is modelled to calculate a forecasted value

for the percentage change in GDP at the next time point.

Inflation is introduced into the modelling process as an explanatory variable to create a bivariate

model. The aim of bivariate modelling is to improve upon the univariate forecasting methods.

1.1 General objective

The general objective of this investigation is to determine whether the percentage change in GDP

from quarter to quarter can be forecasted using statistical models.

1.2 Specific objectives

• To model GDP using a variety of statistical modelling techniques, and find a final model

that fits the data for each technique.

• Calculate a forecast value for each model for the percentage change in GDP from the

previous quarter in the UK for Q4-2013.

• Determine the optimal model/modelling technique to forecast GDP in the UK.

• Determine the level of accuracy of the model.

Page 10: Natalie Fuller Thesis

CHAPTER 1. INTRODUCTION 3

1.3 Notation and abbreviations

Notation

• Greek letters are used to denote parameters.

• Greek letters emphasised with a hat denote the estimate of a parameter. For example the

estimated value of α is α.

• Parameter tests are carried out with a general hypothesis of:

H0: Parameter = 0.

H1: Parameter 6= 0.

• A 95% confidence limit is assumed for each hypothesis test.

• All modelling and forecasting is carried out using SAS 9.3 statistical software.

Abbreviations

ACF Autocorrelation Function

ADF Augmented Dickey Fuller

AIC Akaike information criterion

AR Autoregressive

ARCH Autoregressive Conditional Heteroscedasticity

ARIMA Autoregressive Integrated Moving Average

GARCH Generalized Autoregressive Conditional Heteroscedasticity

GDP Gross Domestic Product

MA Moving Average

MSE Mean Squared Error

NIESR National Institute of Economic and Social Research

PACF Partial Autocorrelation Function

RMSE Root Mean Squared Error

SAS SAS 9.3 statistical software

SBC Schwartz Bayesian Criterion

Page 11: Natalie Fuller Thesis

Chapter 2

Literature Review

Many attempts have been made to forecast GDP around the world using statistical models. The

Bank of England have published a working paper with the aim of forecast UK GDP Growth,

inflation and interest rates under structural change using multivariate statistical models [5]. The

paper evaluated the performance of a variety of models with time varying parameters. The authors

have compared the use of different types of vector autoregressive (VAR) and factor-augmented

vector autoregressive (FAVAR) models in order to forecast for the rate of change of GDP. The

Bank of England used the RMSE (root mean squared error) to evaluate the performance of the

forecasting method. The model that stood out as the most successful model was the time-varying

parameter factor-augmented vector autoregressive (TVP-FAVAR) model.

Rajaguru and Abeysinghe forecast the GDP in China based upon the ChowLin disaggregation[6].

In this project the authors used a regression of the annual GDP on an annual related series to

come up with a predictive equation. They then use the quarterly figures of the related series

to generate predictions for the quarterly figures for GDP. The authors found this method to be

challenging, and the disaggregated series behaved too smoothly compared to the actual series.

For this reason the ARIMA and VAR models were considered. These models were evaluated using

the RMSE values, and the VAR model technique was found to be optimal.

Andrei and Bugudui Create forecasts for the GDP in the United States[7]. These authors developed

models using the ARIMA model of the Box-Jenkins model. The optimal ARIMA model was found

based on the residual analysis, the akaike information criterion (AIC) and the Schwartz Bayesian

4

Page 12: Natalie Fuller Thesis

CHAPTER 2. LITERATURE REVIEW 5

criterion (SBC). Forecasts were calculated, and the authors found that the total R2 value was

very low, suggesting that forecasts cannot be made to a high level of accuracy using the previous

value.

Fang has published a working paper forecasting real GDP growth in Japan [8]. In the paper

Fang has modelled the volatility of GDP growth using different types of general autoregressive

conditional heteroscedasticity (GARCH) model including the exponential (EGARCH), and Inte-

grated (IGARCH) type models. Residuals for the models were used as a test for lack of fit. Fang

found that reducing outliers in the data show a better residual output to the data. This paper

is a working paper, and no optimal model was found, however it is expressed that the GARCH

method works well in modelling the data with no outliers.

The four literature examples mentioned carry out different statistical modelling techniques to

forecast for GDP globally. Univariate, and multivariate ARIMA, GARCH and VAR models were

examined. All authors appear to have different conclusions, which suggest that both univariate

and multivariate models should be considered. Univariate ARIMA and GARCH models proved to

be successful in the United States, and Japan respectively, for this reason these models will be

considered as a univariate solution. Multivariate VAR and GARCH models have been considered,

with the VAR proving to be more popular. A multivariate modelling technique to be considered

will depend on the results of the univariate process. The four authors use different techniques to

find the optimal model the AIC, SBC, residual output, RMSE, and R2 values were all considered.

All of these methods will be considered as tools to find the optimal model, and evauluate its

success forecasting the percentage change in GDP in the UK.

Page 13: Natalie Fuller Thesis

Chapter 3

Data Overview and Analysis

GDP economic statistics have been retrieved from OECD (The organisation for economic co-

operation and development)[9]. This organisation has been chosen as it provides a variety of

monetary statistics to the public *Please note GDP statistics on OECD are updated fre-

quently as new estimates are produced regularly - The data is exact at the time of the

date accessed.

Gross domestic product (GDP) refers to the value of all goods and services (in local currency)

produced within a country during a quarter, and is often used as a reflective value for the national

economic conditions. GDP is commonly used to comment upon how well a particular country’s

economy is developing and performing.

The variable that has been obtained from OECD is the percentage change in the raw GDP

monetary value from the previous quarter. This variable was obtained because when commenting

on GDP the value that is often published and referred to is the percentage change from quarter

to quarter. The raw monetary value was calculated using the expenditure approach. Data has

been retrieved from the first quarter of 1970 (Q1-1970) until the third quarter of 2013 (Q3-

2013). There are three ways of calculating GDP; the production approach, income approach and

expenditure approach. The expenditure approach has been chosen as this is the most frequently

method used.

6

Page 14: Natalie Fuller Thesis

CHAPTER 3. DATA OVERVIEW AND ANALYSIS 7

3.1 GDP - The expenditure approach

The raw monetary value of GDP has been calculated using the expenditure approach; this is the

value of the total expenditure of all individuals in the country in a year. To calculate the raw

GDP figure the formula shown in equation 3.1 is used.

GDP = C +G+ I +NX (3.1)

[10]

Where:

C = Consumption

G = Government Spending

I = Investment

NX = Net Exports

3.2 Descriptive statistics

Descriptive statistics have been carried out on the percentage change GDP from the previous

quarter, these are shown in table 3.1.

Number of Observations Minimum Maximum Mean Standard Deviation Variance

175 −2.468 5.278 0.573 0.989 0.979

Table 3.1: Descriptive statistics of the percentage change in GDP from previous quarter from to

Q1-1970 to Q3-2013

The data was recorded from Q1-1970 to Q3-2013. There were 175 observations meaning this

time series is long enough to model using statistical models. GDP was at its peak in Q1-1973 at

a value of 5.278, which runs in line with the fact that the mortgage regulations were changed at

this time of year [2]. GDP was at its lowest in Q1-2009, at a value of -2.46, which also runs in

line with the recession in 2008[3]. The mean of the data is 0.573 suggesting that GDP is growing

more that it is falling; this is to be expected as inflation causes the net worth of a country to

Page 15: Natalie Fuller Thesis

CHAPTER 3. DATA OVERVIEW AND ANALYSIS 8

increase.

3.3 Timeplot analysis

Before any statistical analysis is carried out, it is vital to have an understanding of the way GDP

has changed throughout time. As GDP is a quarterly figure, this may suggest that there would be

a quarterly seasonal trend in the data, however the timeplot shown in figure 3.1 shows this is not

the case. The timeplot shows that GDP between 1970 and 1985 fluctuated significantly and only

started to settle down around 1995. The economy remained stable until the most recent drop in

2008.

Figure 3.1: A timeplot to show the percentage change in GDP against date.

The timeplot shows the percentage change in GDP to be fluctuating about a constant mean.

Apart from the extreme fluctuation in the 1970’s, this time series looks to be quite stationary (a

term that is defined clearer in chapter 4) with no overall upward or downward trend.

Page 16: Natalie Fuller Thesis

Chapter 4

Univariate Box-Jenkins Modelling

Box-Jenkins models are a group of models that are used to forecast seasonal or non-seasonal time

series. The timeplot for GDP shows the data to be non-seasonal and therefore only non-seasonal

Box-Jenkins models will be considered. To model a time series by a Box-Jenkins model the data

must be stationary. An informal definition of a stationary process is a process ”whose statistical

properties do not change over time”[11]. A process is strictly stationary if at any moment in

time GDP has the same distribution e.g. Y−1, Y0, Y1, Y2, . . . , Yn has the same distribution

as Y−1+k, Y0+k, Y1+k, Y2+k, . . . Yn+k. In this case Yt denotes the value of the percentage

change in GDP at time t. This definition is so strict that it doesnt apply to everyday life time series

data, and therefore a weaker version of stationary called covariance stationary can be applied. In

the case of covariance stationary a time series is said to be stationary if:

1. [EYt] = [EYt − k] where E[Yt] = The mean function at time t.

2. COV [Yt, Yt−k] = COV [Y0, Yk] where COV [Yt, Ys] = The covariance between the

two time points t and s.

3. When k = 0

COV (Yt, Yt) = COV (Y0, Y0)→ (Yt) = V (Yt) where V (Yt) = Variance at time t.

Thus for a time series to be classed as stationary, the mean function is constant over time, the

covariance of the process between two time points only depends on the time difference and the

variance is constant over time. [12]

9

Page 17: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 10

To find the optimal Box-Jenkins model to fit to the data, the following process must be followed.

This includes the following steps:

1. Check for non-stationarity.

2. Model identification and selection - The autocorrelation function (ACF) and partial auto-

correlation (PACF) are used to suggest an appropriate model to fit the data.

3. Parameter testing - Hypothesis tests are carried out to check that all parameters chosen are

needed in the model.

4. Residual analysis - In this step, once this model has been fit, residual plots are analysed to

test the models fit to the data.

5. Parameter estimation - Fitting the model and estimating any parameters in the model.

4.1 ARIMA (p,d,q) modelling

A non-seasonal Box-Jenkins model is the autoregressive integrated moving average (ARIMA)

(p,d,q) model. The ARIMA model is a combination of the autoregressive (AR) and moving

average (MA) models. The AR model uses the previous values of the dependent variable to

calculate a forecast. The general equation of the AR model is shown in equation 4.1 .

Yt = δ + φ1Yt−1 + φ2Yt−2 + . . .+ φnYt−n + εt (4.1)

[13]

Where:

Yt = The value of the dependent variable at time t.

φ = The auto regressive parameter.

δ = A constant.

εt = The error term(residual) at time t.

εt ∼ NID(0, σ2) Errors are assumed to be normally independently distributed.

Page 18: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 11

In a similar way the MA model uses the error term from the previous time point values to calculate

the current value. The general equation for an MA model is shown in equation 4.2.

Yt = δ − θ1εt−1 − θ2εt−2 − . . .− θεt−n + εt (4.2)

[14]

Where:

Yt = The value of the dependent variable at time t.

θ = The moving average parameter.

δ = A constant.

εt = The error term (residual) at time t.

εt ∼ NID(0, σ2) Errors are assumed to be normally independently distributed.

The integrated section of the model corresponds to a step in the process called the differencing

step, whereby the series may be differenced to make it stationary. This procedure is often used

when the original data is non-stationary. An ARIMA (p,d,q) model has parameters p, d and q.

Where p is the order/number of autoregressive terms, d is the number of times the time series has

been differenced, and q is the order/number of moving average terms in the model. The general

model for ARIMA (p,d,q) can be expressed by the equation 4.3.

Zt = δ + φ1Zt−1 + . . .+ φpZt−p − θ1εt−1 − . . .− θqεt−q + εt (4.3)

Where:

Zt = Yt − Yt−1 − Yt−2 . . . Yt−d (The differenced dependant variable).

εt ∼ NID(0, σ2) Errors are assumed to be normally independently distributed, with a

mean of 0 and a variance of σ2.

The equation for an ARIMA(p,d,q) model can also formulated using backshift notation. The

backward shift operator (Bn) shifts the data back by n number of time periods. The general rule

for backshift notation is shown in equation 4.4.

BYt = Yt−1, B2Yt = Yt−2 . . . B

nYt = Yt−n (4.4)

The general formula for the non-seasonal ARIMA(p,d,q) model with backshift notation is shown

in equation 4.5. It is useful to use backshift notation to help find the final equation of the model,

as when the order of parameters are greater than 1 finding the final model can prove tricky.

Page 19: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 12

(1−B)dYt = µ+θ(B)

φ(B)εt (4.5)

Where:

B = The Backshift Operator

µ = A constant

φ(B)= 1− φ1B − . . .− φpBp

θ(B)= 1− θ1B − . . .− θqBq

[15]

4.2 The ARIMA modelling process

ARIMA modelling has been carried out using the maximum likelihood method. The first action in

the modelling process is to produce a 4-in-1 plot (Figure 4.1). This particular 4-in-1 plot consists

of a timeplot of the data, an ACF, a PACF, and an inverse auto correlation plot (IACF). For the

purpose of this project the IACF plot is not used.

4.2.1 Check for non-stationarity

For a time series to be modelled by a Box-Jenkins model the data must be stationary. The time

plot in 4.1 shows a good indication as to whether the time series is stationary.

Page 20: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 13

Figure 4.1: A 4-in-1 plot showing the timeplot of the data, ACF, PACF, and IACF plots for

the percentage change in GDP from previous quarter. The x-axis of the ACF/PACF/IACF plot

indicates the lag at which the autocorrelation is computed. The y-axis indicates the value of the

correlation calculated at that specific lag.

As previously mentioned in chapter 3 section 3.3, the timeplot shows the data to be fluctuating

about a constant mean. This suggests that the data has a constant mean. However it is not clear

whether the other factors of a stationary series shown on page 9 are satisfied. For this time series

the covariance stationary definition has been used. A way of testing for non-stationarity within a

time series is the augmented Dickey Fuller (ADF) test. The ADF tests for a unit root in a time

series. If the series has a unit root then the series is said to be non-stationary.

Lag F statistic P value Decision

1 25.640 0.001 Reject H0

2 14.230 0.001 Reject H0

3 14.520 0.001 Reject H0

Table 4.1: Single mean augmented Dickey Fuller test statistics: The F statistic is the test statistic

value that determines the P value using the F statistical tables. The P value is the value that

finalises the hypothesis decision. The lag value is number of lags the test is being carried out on.

Page 21: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 14

The timeplot in figure 3.1 on page 8 shows the data to have a constant mean throughout time,

and therefore it can be said to have a single mean. For this reason a ’single-mean’ ADF test is

carried out. The P values shown in table 4.1 are used to test the following hypotheses.

H0: The series has a unit root.

H1: The series does not have a unit root. [16]

All P values listed in table 4.1 are less than 0.05, therefore there is sufficient evidence to reject

the null hypothesis (H0). It is 95% certain that the time series does not have a unit root. As

the data is stationary it does not need to undergo the differencing step or any transformations.

It can now be deduced that the value for d for the ARIMA model is 0.

4.2.2 Model identification and selection

The next step in the ARIMA modelling process is to identify the order of the AR and MA

parameters. To find these values the ACF and PACF plots are used. An ACF plot shows the

autocorrelation of each specific time lag with the previous time lags. A point in the ACF is said

to be significant if it is above the shaded area. The shaded area on the ACF plot shows when

it is 95% certain that there is no autocorrelation with previous time points. For example a spike

above the shaded area at lag 1 indicates a strong correlation between each value of the series and

the previous time point[17]. A spike above the shaded area at lag 2 shows that there is a strong

correlation between each value and the value occurring two time points before. The pattern

continues as the lags increase. The correlation value is a measure of the strength of association,

the closer to 1 the correlation, the stronger the association. The PACF shows a similar statistic,

except with the conditional correlation instead of the auto correlation.

For both the ACF and PACF plot, there are three points that are slightly outside of the shaded

area. Therefore it can be said that both the ACF and PACF plots are dying away. If an ACF

plot is dying away it suggests that an AR term should be included in the final model. The same

applies for the PACF plot and the MA term. In this case the PACF and ACF plots are implying

that both an AR and MA term should be in the model. This implies that the ARIMA model to fit

the data is an ARIMA(p,0,q) often referred to as an ARMA(p,q). The ACF and PACF plots do

not give a clear solution as to the order of the AR and MA parameters. Therefore ARMA models

with differing orders will be considered. ARMA models ranging from ARMA(1,1) to ARMA(3,3)

Page 22: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 15

will be considered as there are three significant points outside of the shaded areas on the ACF

and PACF plots.

4.2.3 Parameter testing

The models in table 4.2 have been fit to the data using SAS. Once a model has been fit, the

parameters of the model must be tested to see if they are significant in the model. Each param-

eter has a different formal hypothesis test, an example of the formal parameter test for an MA

parameter is shown below:

H0 : θ = 0.

H1 : θ 6= 0.

As there will be multiple parameter tests carried out throughout this investigation, a general

hypothesis for each parameter will be used in each case. This is shown below:

H0 : Parameter = 0.

H1 : Parameter 6= 0.

P value < or > 0.05 ?

Model Constant AR1 AR2 AR3 MA1 MA2 MA3 Keep?

ARMA(3,3) < > < > > < > No

ARMA(3,2) < > < < > < No

ARMA(2,3) < > < < < < No

ARMA(2,2) < > > > > No

ARMA(2,1) < > > > No

ARMA(1,2) < < > > No

ARMA(1,1) < < < Yes

Table 4.2: ARMA(p,q) Parameter hypothesis tests. If a P value is less than 0.05 the null hypothesis

can be rejected, and thus the parameter should be in the model. For a model to be considered

each parameter in the model should have a P value of less than 0.05[18].

Page 23: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 16

Table 4.2 shows the results for the parameter tests for models ARMA(3,3) to ARMA(1,1) respec-

tively. There is one model that will be considered further, this is the ARMA(1,1) model. This is

the only model to be considered as this is the one model where all parameters are significant in

the model.

4.2.4 Residual analysis

An assumption to all Box-Jenkins models is that the residuals are normally independently dis-

tributed, and therefore follow a white noise process. Where the residual are defined as:

εt = Yt − Yt

Residual analysis is carried out on the ARMA (1,1) to validate this assumption. The following

tests must be carried out and the criteria shown in table 4.3 must be met.

Criteria Test Response

Independence Ljung Box Test For up to 30 lags the P value should be

greater than 0.05 suggesting that there is

no autocorrelation in the residuals.ACF and PACF Plot For all lags the ACF and PACF should show

no significant points outside of the shaded

area on the plot.

Normality Q-Q plot Points should be close to the line and not

show a significant differing trend.

Histogram of residuals The histogram should be distributed sym-

metrically, and should show a bell shaped

curve.White Noise White noise probability plot The plot should show all P values to be

greater than 0.05. (the ’lower’ the bars of

the graph the better).

Table 4.3: Univariate Box-Jenkins Residual criteria - For a model to fit the data well all criteria

need to be met

Page 24: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 17

Test for independence

The Ljung Box test checks that the residuals are independently distributed; this is carried out by

checking for autocorrelations. Autocorrelations in the residuals would show that the assumption

of independence is not met suggesting lack of fit of the model. In the Ljung Box test the following

hypotheses are tested:

H0: ρk = 0.

H0: ρk 6= 0 (For at least one value of K) .

To Lag (k) χ2 Statistic P value Decision

6 6.530 0.163 Do Not Reject H0

12 9.770 0.461 Do Not Reject H0

18 12.710 0.694 Do Not Reject H0

24 25.830 0.259 Do Not Reject H0

30 39.370 0.075 Do Not Reject H0

Table 4.4: Ljung Box test - The χ2 statistic is the test statistic for the Ljung Box test. The P

value determines the decision in the Ljung Box test. The lag number (k) is the number of lags

that the test has tested to.

For lags up to 30 all P values are greater than 0.05 therefore it is 95% certain that there are

no significant autocorrelations in the residuals. Figure 4.2 shows a 4-in-1 plot of the residuals

after the model has been fit. There are no significant lags in the sample ACF and PACF plots,

therefore backing up the Ljung Box text showing no significant autocorrelation in the residuals.

Page 25: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 18

Figure 4.2: A 4-in-1 plot showing the ACF, PACF, IACF and white noise probability plots for the

residuals for the model. The ACF/PACF plot have the same axis as in the plots in figure 4.1 on

13. The white noise probability plot y-axis shows the P value that is tested against a white noise

hypothesis. The x-axis shows the lag to which this test is carried out (The lower the bars the

higher the P value).

Test for normality

Residuals also need to be normally distributed. Figure 4.3 helps determine how the residuals are

distributed by showing a histogram and Q-Q (Quantile-Quantile) plot of the residuals. A Q-Q plot

is one graphical method that is used to check the assumption of normally distributed residuals. If

the residuals follow a normal distribution then the histogram will show a bell shaped curve, and

in the Q-Q plot the residuals will fall along a straight line.

Page 26: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 19

Figure 4.3: A Histogram and Q-Q plot of the residuals from the ARMA(1,1) model.

The distribution of residuals histogram in figure 4.3 show bell shaped curve and therefore follows

the trend line for a normal distribution. Similarly the Q-Q plot shows the points to be close to

the line, therefore suggesting that the residuals are normally distributed. From the two plots in

figure 4.3 it can be said that the residuals are normally distributed.

White noise probability test.

From the white noise probability plot in figure 4.2, a hypothesis test can be carried out to test

for white noise. The following hypotheses are tested.

H0: Residuals follow a white noise process.

H1: Residuals do not follow a white noise process.

The y-axis of the white noise probability plot shows the P value for the white noise test, and the

x-axis shows the lag at which the test is carried out, the lower the height of the bar, the higher the

P value. A white noise series will show all bars in the plot to have a P value of greater than 0.05,

and thus no bars will go above the line at the value of 0.05. The P values are all greater than

0.05, therefore there is insufficient evidence to reject H0 and is it 95% certain that the residuals

follow a white noise distribution.

Page 27: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 20

4.2.5 Parameter estimation

The optimal ARIMA (p,d,q) model for this data was found to be the ARMA(1,1) model. For this

model the integrated differencing step was not needed, and the model is an ARIMA model with

one AR parameter, and one MA parameter. Parameter estimates have been calculated using SAS

for the ARIMA(1,1) model, these are shown in table 4.5. In this case as the order of the both

the AR and MA part of the ARMA model were found to be 1, it is not useful to use backshift

notation to find the model. The model can be found be simply substituting in the parameter

estimates to the original ARIMA(p,d,q) model found in equation 4.3 on page 11.

Notation Parameter Estimate

δ 0.149

θ1 0.573

φ1 0.739

Table 4.5: ARMA(1,1) Parameter Estimates

Equation of the model

Yt = 0.149 + 0.739Yt−1 − 0.5731εt−1 + εt

εt ∼ NID(0, σ2)(4.6)

4.3 Forecasting GDP by an ARMA(1,1) model

The model shown in equation 4.6 can be used to forecast the percentage change in GDP. This

model includes an autoregressive term, meaning that the model is using the previous time point

to predict the value of GDP at the next time point. If forecasts are calculated further ahead than

the next time point, then this will use the previous forecasted value. This means that the further

in the future the forecast is, the less accurate it will be.

Page 28: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 21

Figure 4.4: ARMA(1,1) forecast plot : The red ’Forecast for GDP’ series shows the value of the

percentage change in GDP that the ARIMA(1,1) model would have predicted at time point t

Figure 4.4 shows the projection of the forecasts for GDP (calculated by SAS) in the future. The

forecasts converge linearly to 0.571, which is the process mean. For this reason it is only accurate

to forecast one step ahead. Therefore the only forecast that will be examined is the next time

point. The forecast for GDP in Q4-2013 is 0.563. The value for the percentage change in GDP

from the previous quarter for the third quarter of 2013 has since become available, and this value

is 0.721. The predicted value is not very close to the actual value for GDP, this suggesting that

the model may not be that reliable for forecasting GDP in the future.

Forecasts can also be calculated by hand. To calculate the next value of GDP by hand, one would

need to find the value of the residual at the previous value. To estimate this by hand is a long

process, and a predicted value for each time point would need to be produced from the beginning

of the series. This is very time consuming and therefore SAS is used to calculate the predicted

values. An example of how one would calculate a single forecast by hand is shown below. The

real values of the constraints in the model are substituted into the model shown in equation 4.6.

The error at time t(εt) is unknown and therefore is assumed to be 0.

Page 29: Natalie Fuller Thesis

CHAPTER 4. UNIVARIATE BOX-JENKINS MODELLING 22

Yt−1 Yt−1 εt−1= Yt−1-Yt−1 εt

0.799 0.491 0.309 0

Table 4.6: ARMA(1,1) Model constraints

The forecasted value that SAS produced is confirmed in equation 4.7.

Yt = 0.149 + 0.739(0.799) + 0.573(0.309)

= 0.563(4.7)

Page 30: Natalie Fuller Thesis

Chapter 5

Univariate GARCH Modelling

The Box-Jenkins model fitted in chapter 4 required the time series to be stationary. One of the

conditions of a stationary series is that the series has equal variances throughout. For this sample

of GDP, the Dickey Fuller test (shown in subsection 4.2.1 on page 12) shows the variance to be

calm enough to say the time series is stationary. However for GDP there are some points in time

where GDP has dramatically changed due to many reasons, this may be described as an outlier

prone series. It is very common in economic data for the variances to be time-varying, and such

condition in time series is called heteroscedasticity[19].

There are some changes in variance suggesting that there may be heteroscedasticity present in

the series. The Autoregressive Conditionally Heteroscedastic (ARCH) model was discovered by

Robert Engle in 1982 as a model that accommodates for time-varying variances[20]. It does this

by modelling the variance/standard deviation in a time series. A generalised version of the ARCH

model was introduced by Tim Peter Bollerslev in 1986[20], and can be defined as the General

Autoregressive Conditional Heteroscedasticity (GARCH) model.

The GARCH model formulates the conditional variance, the structure models the variance in more

of an ARMA structure. An ARMA(1,1) model fits this data well suggesting that the GARCH

model could also fit this time series. If it is shown that there is heteroscedasticity present in the

GDP time series the GARCH method of forecasting could improve on the ARMA(1,1) model.

23

Page 31: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 24

5.1 GARCH(p,q) modelling

The GARCH(p,q) modelling technique follows a similar process to the Box-Jenkins Process, and

this is outlined in the four steps below.

1. Check for Heteroscedasticity

2. Parameter testing and model identification - GARCH models are fit using SAS and orders

are reduced until all parameters are significant in the model.

3. Residual analysis - In a GARCH model it is an assumption that the errors are normally

independently distributed. The values of the residuals should stay between the values of 2

and -2. In this step, once the model has been fit residual plots are analysed to check that

the model fits the data appropriately.

4. Parameter estimation - Fitting the model and estimating any parameters in the model.

The general formula for a GARCH(P,Q) model is shown in equation 5.1[21].

σ2t = α0 + α1Y

2t−1 + β1σ

2t−1 + . . .+ αpY

2t−p + βqσ

2t−q

Yt = σεt + δ (5.1)

εti.i.d∼ N(0, 1)

Where:

α = The ARCH parameter constant.

β = The GARCH parameter constant.

p = The order of the GARCH parameter.

q = The order of the ARCH parameter.

σ2 = The conditional variance of the growth rate.

δ = A constant in the model.

εt = Identically independently distributed random variable with mean of 0 and variance 1.

α > 0.

β > 0.

Page 32: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 25

Similar to the ARIMA model, the GARCH(p,q) model can be written using backshift notation,

shown in equation 5.2. This may be useful when working out the final model.

σ2t = α0 + α(B)Y 2

t + β(B)σ2t (5.2)

where

α(B) = α1B + α2B + . . .+ αqBq (5.3)

and

β(B) = β1B + β2B + . . .+ βpBp (5.4)

5.1.1 Check for heteroscedasticity

Previously in the Box-Jenkins modelling process, a Ljung-Box test was carried out to check for

autocorrelations in the residuals. Here, in a similar way, a Portmanteau Q test is used to test for

heteroscedasticity within the series, by testing for changes in variance across time. A Portmanteau

Q test is carried out in SAS, the output can be shown in table 5.1.[22]

Order Q statistic P value Decision

1 7.367 0.007 Reject H0

2 9.052 0.011 Reject H0

3 12.798 0.005 Reject H0

4 21.318 < 0.001 Reject H0

5 21.927 < 0.001 Reject H0

6 21.938 0.001 Reject H0

7 22.347 0.002 Reject H0

8 22.350 0.004 Reject H0

9 22.649 0.007 Reject H0

10 22.882 0.011 Reject H0

11 22.882 0.018 Reject H0

12 23.485 0.024 Reject H0

Table 5.1: Portmanteau Q test for heteroscedasticity

Page 33: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 26

H0 : Heteroscedasticity is not present in the time series.

H1 : Heteroscedasticity is present in the time series.

For k ≤ 12 the P values are less than 0.05. Therefore it is 95% certain that heteroscedasticity

is present. The fact that there is heteroscedasticity present in this time series suggests that an

ARCH or GARCH model may be appropriate for this data.

5.1.2 Parameter testing and model identification

Often in the GARCH modelling process, the data needs to be differenced to make it stationary. To

do this one would model the returns(Yt − Yt−1) or logreturns (log(Yt − Yt−1) to model the data.

However, as shown previously this data has been shown to be stationary with unequal variances.

To choose the most appropriate GARCH(p,q) model to fit to the data, a model of high order has

been fitted. Once this model has been fitted, orders are reduced sequentially fitting each model in

turn. The conditions for the ARCH and GARCH models are that the parameters must be greater

than 0. If a model cannot be found in such criteria, the model will be eliminated.

A GARCH(5,5) model is fitted first and the orders of p and q are reduced until the GARCH(1,1)

is reached. Parameter tests are carried out for each model.

P value < or > 0.05 ?

Model Intercept A0 A1 A2 A3 A4 A5 G1 G2 G3 G4 G5 Keep?

GARCH(5,5) < > < > > > > > > > > > No

GARCH(4,4) < < < > < < > > < < No

GARCH(3,3) < > < < < < < < Yes

GARCH(2,2) < > > > > > No

GARCH(1,1) < < < < Yes

ARCH(2) < < < < Yes

Table 5.2: GARCH(p,q) parameter testing process, where An is an ARCH parameter of order n,

and Gn is a GARCH parameter of order n [18].

Table 5.2 shows the results for GARCH(p,q) parameter testing process. The general parameter

hypothesis has been tested. If the P value for this specific parameter is less than 0.05 then it follows

that the null hypothesis should be rejected and the parameter should be in the model. There is

one exception to this rule, which is A0 (ARCH parameter of order 0). The A0 parameter must

Page 34: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 27

always be present in a GARCH(p,q) or ARCH(p,q) model, and therefore it is acceptable for the

A0 term to have a P value of greater than 0.05. Looking at the results for the GARCH(2,2) model

in table 5.2, it can be seen that all of the GARCH parameters do not fit the model. It is possible

that a generalisation may not be needed for this model (p=0), and that an ARCH(2) may fit the

data. It is for this reason that the ARCH(2) (a GARCH(p,q) model with no generalisation) model

has also been tested. Table 5.2 shows that models GARCH(5,5), GARCH(4,4), and GARCH(2,2)

include parameters that are not significant. These models can now be discounted as they do not

fit the data. The models that will be considered further are the GARCH(3,3), GARCH(1,1) and

ARCH(2) models.

5.1.3 Residual analysis

After the models have been fit to the data, the adequacy of the model to the data must be

evaluated. This is carried out by looking at a set of residual plots for each model. For a

GARCH/ARCH model the residuals should behave like regression residuals. Similar to Box-

Jenkins modelling, the residuals should be normally distributed, follow a white noise process andbe

independent. A good model will also show that most/all standardized residuals are between 2

and -2 to show a good level of accuracy. The residual plots for each of the models include a; time

plot of standardized residuals, time plot of GDP, a histogram of residuals, white noise probability

plot and a sample ACF and PACF. For a model that fits the data, the criteria shown in table 5.3

must be met.

Criteria Plot Response

Accuracy Standardized Residuals The standardised residuals should stay between the

values of 2 and -2.Normality Histogram of Residuals The histogram should be distributed symmetri-

cally, and should show a bell shaped curve.

White noise White noise probabilities The plot should show all P values to be greater

than 0.05 (the lower the bars of the graph the

better).

independence ACF & PACF For all lags the ACF and PACF should show no

significant points outside of the shaded area on

the plot.

Table 5.3: GARCH modelling residual analysis criteria

Page 35: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 28

Figure 5.1: GARCH(1,1) Residual output

The GARCH(1,1) model has been fit to the data and the SAS output is shown in figure 5.1. The

results from the residual analysis are shown in table 5.4, it suggests that the model does not fit

the data as well as it could. There are 3.4% of the standardized residuals outside of the 2 and

-2 bands, this is acceptable within the 95% confidence limits. The fact that the histogram of

residuals shows a slight positive skew suggests that the residuals might not be normally distributed,

this is linked in with the white noise test. The white noise test suggests that the residuals may

not follow a white noise process. The results from the ACF and PACF plots are acceptable.

Criterion Response

Accuracy There are 6 points in the plot that lie outside of the 2 and -2

boundaries.Normality The histogram seems to be bell shaped with a slight positive skew.

White Noise Approximately 40% of the lags show the residuals to follow a white

noise process.

independence The majority of lags are within the shaded area, with a few very

small expections.

Table 5.4: GARCH(1,1) Residual Analysis

Page 36: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 29

In a similar way the GARCH(3,3) model has been fit to the data. SAS uses iterations to find

parameter estimates for the model. SAS has a maximum capacity to carry out 32767 Iterations

on a GARCH Model. In this case, the software was unable to find values of α and β that are

greater than 0 within 32767 iterations. This suggests that there is no GARCH(3,3) model that

follows the parameter criteria, and that the GARCH(3,3) model does not fit the data. This model

will now be eliminated from any further discussion.

ARCH(2)

The ARCH(2) model has been fit to the data in SAS, and the SAS output is shown in figure 5.2.

Figure 5.2: ARCH(2) Residual output

Page 37: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 30

Plot Response

Accuracy There are 9 points in the plot that lie outside of the 2 and -2

boundaries.Normality The histogram seems to be bell shaped with a very slight positive

skew.White Noise Approximately 40% of the lags show the residuals to follow a white

noise process.

independence The majority of lags are within the shaded area, with a few very

small expection.

Table 5.5: ARCH(2) Residual Analysis

The results shown in table 5.5 are similar to the results for the GARCH(1,1) model with a few

exceptions. The standardized residuals plot has 9 points outside of the 2 and -2 boundaries, this

is just over 5%. This would suggest that this particular model does not fit as well as the other

model considered. This model is now eliminated and the GARCH(1,1) will be chosen and taken

forward.

5.1.4 Parameter estimation

The parameters for the GARCH(1,1) model have been estimated by SAS, these are shown in table

5.6

Notation Parameter Estimate

δ 0.686

α0 0.085

α1 0.531

β1 0.481

Table 5.6: GARCH Parameter (1,1) Parameter Estimates

These parameter estimates are now substituted into the general GARCH(p,q) model shown in

equation 5.1. The GARCH(1,1) model with parameter estimates is shown in Equation 5.5. As all

parameters are to order 1, it is not beneficial to write the model using backshift notation.

Page 38: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 31

σ2t = 0.085 + 0.531Y 2

t−1 + 0.481σ2t−1

Yt = σtεt + 0.686 (5.5)

εti.i.d∼ N(0, 1)

This model does not show a very good fit to the data and therefore will not be used to forecast.

The model was not completely unacceptable, and for this reason it may be a possible to combine

another term with the GARCH(1,1) model.

5.2 AR(P)/GARCH(p,q) modelling

Box-Jenkins AR and GARCH models are often combined to create AR(P)/GARCH(p,q)[23] mod-

els. *Note - The ’p’ in an AR model has been changed to a capital ’P’ to show a difference

between terms. The reason for combining the AR model with the GARCH model is to create

a more accurate forecast using the previous value of the dependant variable as a parameter in

the model. The general formula for an AR(P)/GARCH(p,q) model (shown in equation5.6) is

similar to the general formula for the GARCH(p,q) model, the only difference is the additional

autoregressive term[24]:

σ2 = α0 + α1(Yt−1 − ηt−1)2 + β1σ

2t−1 + . . .+ αp(Yt−p − ηt−P )2 + βqσ

2t−q

Yt = ηt + σtεt + δ (5.6)

ηt = φ1Yt−1 + . . .+ φPYt−P

εti.i.d∼ N(0, 1)

Where:

P= The order of the AR Parameter

φ = The constant for the autoregressive term

ηt= The addition of the AR model

1. Choosing the order of the autoregressive term - Find the best fitting AR(p) model following

the Box-Jenkins modelling process shown on page 10.

Page 39: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 32

2. Parameter testing and model identification - Fit the AR(p) alongside the best fitting

GARCH(p,q) model to create the AR(P),GARCH(p,q) model, and test for parameter sig-

nificance.

3. Residual analysis - In a GARCH model it is an assumption that the errors are normally

independently distributed. The values of the residuals should stay between the values of 2

and -2. In this step, once the model has been fit residual plots are analysed to check that

the model fits the data appropriately.

4. Parameter estimation - Fitting the model and estimating any parameters in the model.

5.2.1 Choosing the order of the autoregressive term

Box-Jenkins - identification and selection process

The first step to find an AR/GARCH model that fits to the data is to find the most suitable

AR model to fit to the data. An AR model is essentially an ARIMA (p,d,q) where there are no

differences or moving average terms. i.e. d = 0 and q = 0. As previously shown in the ARIMA

Modelling process an AR model on its own is not the optimal model for this data. However this is

a vital step in the GARCH Modelling Process. The original 4-in-1 plot for GDP shown in figure 4.1

on page 13 is re-examined. This time only an AR model will be considered. This would suggest

that the ACF plot must be dying away. As mentioned previously in the Box-Jenkins modelling

process on page 13, the ACF can be seen to be dying away. As an AR model is the only model

being considered, the PACF must now be cutting off. There are 3 significant points on the PACF

plot, therefore the PACF must be cutting off at lag 1,2 or 3. For this reason an AR1, AR2 and

AR3 will be considered.

Box-Jenkins - Parameter testing

The optimal model will be fitted alongside the GARCH model and parameter tests are carried

out. The results are shown in table 5.7.

Page 40: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 33

Parameter P value < or > 0.05 ?

Model Constant (µ) AR1 AR2 AR3 Keep?

AR1 < < Yes

AR2 < > ∗ < Yes

AR3 < > > < No

Table 5.7: AR(p) Parameter testing process - For a model to be considered all P values must be

less than 0.05 to reject the null hypothesis

In this parameter testing process there is one exception to the less than 0.05 rule. The P value

for the AR1 parameter in the AR2 model is 0.0504. This model has been accepted as this value

is so close to the limit. The models that are considered further are the AR1, and AR2 models.

Box-Jenkins - Residual analysis process

Residual analysis is now carried out, the same residual criterion as the Box-Jenkins modelling

process is followed. This can be seen in table 4.3 on page 16.

AR2

The SAS residual output for the AR2 model is shown in figure 5.4 and 5.3. The output has been

analysed and the results are shown in table 5.8.

Figure 5.3: SAS Residual Output for the AR2 model

Page 41: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 34

Figure 5.4: SAS Residual Output for the AR2 model

Criteria Response

Independence For the Ljung Box test, most P values greater than 0.05.

White noise 95% of P values are greater than 0.05.

Normality The distribution plot follows a bell shaped curve and the points on the

Q-Q plot lie close to the line with a few outliers.

Table 5.8: AR2 Residual output results

From the residual results shown in table 5.8 it can be suggested that residuals are normally

independently distributed, and follow a white noise process. There are a few reservations as there

are a few outliers on the Q-Q plot, and not all P values in the white noise probability plot are

greater than 0.05, however these are acceptable.

AR1

The SAS residual output for the AR1 model is shown in figure 5.6 and 5.5. The output has been

analysed and the results are shown in table 5.9.

Page 42: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 35

Figure 5.5: SAS Residual Output for the AR1 model

Figure 5.6: SAS Residual Output for the AR1 model

Page 43: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 36

Criteria response

Independence Most P values greater than 0.05.

White noise Test 95% of P values are greater than 0.05.

Normality Test The distribution plot follows a slight bell shaped curve and the points

on the Q-Q plot lie close to the line with a few outliers.

Table 5.9: AR1 Residual output results

The output for the AR1 model is extremely similar to the AR2 model output therefore the same

results can be concluded. Both models fit the data with a few reservations. Previously in the

ARIMA modelling process the ARMA model was found to be the optimal model. Therefore

neither of these models is the optimal Box-Jenkins model for this data. However they provide a

good fit to the data. A way of finding which model is the best out of these two is by looking at

the AIC and SBC criterion..

AIC and SBC Criterion

The AIC (Akaike Information Criterion) value gives a measure of the relative quality for statistical

models. It provides a way of choosing the best model out of a selection of models. An alternative

method to the AIC is the Schwarz Bayesian Criterion (SBC) and is closely related to the AIC.

Generally when choosing a model the model with the lowest AIC and SBC will be chosen.[25]

The AIC and SBC for the AR1 and AR2 model are shown in table 5.10.

Model AIC SBC

AR(1) 491.420 497.750

AR(2) 489.379 598.874

Table 5.10: AIC and SBC values

The AIC for the AR2 is lower, however the SBC for the AR1 is lower. For this reason, and the fact

the output was so similar; when moving on to the next process both the AR1 and AR2 model will

be considered. The second step to the parameter testing and model identification section in the

AR(P)/GARCH(p,q) modelling process is to fit the AR parameter alongside a GARCH model using

SAS. The AR1 and AR2 have been found to be both equally acceptable autoregressive models to

Page 44: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 37

fit to the data. For this reason both models are used and fitted alongside a GARCH(1,1) model.

5.2.2 Parameter testing and model identification

P value < or > 0.05 ?

Model Intercept A0 A1 G1 AR1 AR2 Keep?

AR(1)/GARCH(1,1) < < < < < Yes

AR(2)/GARCH(1,1) < > < < < > No

Table 5.11: AR(P)/GARCH(p,q) parameter testing process[18].

Table 5.11 shows that the AR2 parameter does not fit alongside the GARCH model, however the

AR1 does fit. The model that will be considered further is the AR(1)/GARCH(1,1).

5.2.3 Residual analysis

Figure 5.7: AR(1)/GARCH(1,1) Residual output

Page 45: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 38

Residual analysis must now be carried out on this model; the SAS output is shown in figure 5.7.

Plot Response

standardized Residuals There are 6 points in the plot that lie outside of the 2 and -2

boundaries.Histogram of Residuals The histogram seems to be bell shaped.

White noise probabilities Approximately 44% of the lags show the residuals to follow a white

noise process.

ACF & PACF The majority of lags are within the shaded area, with only two

exceptions.

Table 5.12: AR(1)/GARCH(1,1) Residual Analysis

The residuals check shown in table 5.12 suggests that the AR (1)/GARCH(1,1) model does not

fit the data as well as it could. Nevertheless this is an improvement of the GARCH(p,q) model.

The standardized residuals show to have 3.4% of the residuals outside of the 2 and -2 bands, this

is acceptable within the 95% confidence limits. The histogram shows that the residuals are likely

to be normally distributed. The white noise test suggests that the residuals may not follow a

white noise process, however this is also an improvement of the GARCH(p,q) model. The results

from the ACF and PACF plots are show that it is unlikely for there to be autocorrelations in the

residuals. This model is an improvement on the GARCH(1,1) model, and there are no severe

signs that the model does not fit.

5.2.4 Parameter estimation

The parameters for the AR(1),GARCH(1,1) model has been estimated using SAS, these are shown

in table 5.13:

Page 46: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 39

Parameter Notation in model Parameter Estimate

Intercept δ 0.688

AR1 φ1 −0.377ARCH0 α0 0.062

ARCH1 α1 0.451

GARCH1 β1 0.551

Table 5.13: AR(1)/GARCH(1,1) Parameter Estimates

The parameter estimates are now substituted into the general AR(P)/GARCH(p,q) model (shown

in equation 5.6) to form the final model. The model for AR(1)/GARCH(1,1) is shown in equation

5.7

σ2 = 0.062 + 0.451(Yt−1 − ηt−1)2 + 0.551σ2

t−1

Yt = ηt + σtεt + 0.688 (5.7)

ηt = −0.377Yt−1

εti.i.d∼ N(0, 1)

5.3 Forecasting GDP by a univariate GARCH model

The optimal model out of the GARCH(1,1) and AR (1)/GARCH(1,1) will now be chosen to

forecast GDP. As none of the models show a perfect fit to the data, the results must be residuals

and the AIC and SBC will be considered in choosing the optimal model. The residual results

suggest that out of the two GARCH models tested, the AR (1)/GARCH(1,1) model has the

better fit. The AIC and SBC can now be considered to confirm or disprove this conclusion.

Model AIC SBC

GARCH(1,1) 439.143 451.802

AR(1)/GARCH(1,1) 424.823 440.647

Table 5.14: AIC and SBC of GARCH models

Page 47: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 40

The AIC and the SBC of the AR(1)/GARCH(1,1) model are lower by 3.2%, and 2.4% respectively.

This confirms the preliminary conclusion that the AR(1)/GARCH(1,1) model is the optimal model

out of the GARCH models that have been investigated. This model can now be used to create

forecasts for GDP. Forecasts have been calculated by SAS, it is possible to calculate the forecast

by hand however details will not be given here and details on this can be found in Christian

Francq , Jean-Michel Zakoian ’s ”GARCH Models: Structure, Statistical Inference and Financial

Applications” literature piece [26]. A plot of Predicted Vs Actual is shown in figure 5.8.

Figure 5.8: AR(1)/GARCH(1,1) forecast plot : The red ’Predicted’ series shows the value of GDP

that the AR(1)/GARCH(1,1) model would have predicted at time point t.

With this model it is only accurate to forecast one step ahead. Similar to the Box-Jenkins ARMA

model, the AR (P)/GARCH(p,q) model uses the previous time point to forecast. The forecast for

Q4-2013 using this model is 0.730, the actual GDP for this quarter is now 0.720. This forecasting

technique has created a more accurate forecast than the ARMA(1,1) model. Throughout the

Box-Jenkins, and GARCH Modelling process, there hasn’t been a model that can be completely

trusted. This suggests that it may be useful to include an explanatory variable. Multivariate

Page 48: Natalie Fuller Thesis

CHAPTER 5. UNIVARIATE GARCH MODELLING 41

analysis can be carried out using a Box-Jenkins, or GARCH Model. If the Box-Jenkins model is

chosen the vector ARMA (VARMA) process will be carried out. However if the GARCH process

is followed, an explanatory variable will be added to the current model. To choose between the

two modelling processes the AIC and SBC from the optimal models in each process are looked

at, and the lower of the two will be chosen.

Model AIC SBC

ARMA(1,1) 489.405 498.900

AR(1)/GARCH(1,1) 424.823 440.647

Table 5.15: The AIC and SBC for the ARMA(1,1) and AR(1)/GARCH(1,1) models.

For the GARCH method the AIC and SBC are lower by 13.1% and 11.6% respectively therefore

the multivariate GARCH process will be carried out.

Page 49: Natalie Fuller Thesis

Chapter 6

Multivariate GARCH Modelling

Inflation is the rate of change in prices for goods and services. It is expressed as a percentage,

e.g if Inflation on a product is 2% this means the average price for that particular product is 2 %

higher than a year earlier.

6.1 Inflation analysis

The values for Inflation (All items), Inflation (Food), Inflation (Energy) and Inflation (Non-food

non-services) are obtained from OECD [27]. The conditional inflation variables were obtained as

it may be interesting to see how the different types of inflation affect the forecast for GDP. It must

first be checked how closely related these variables are. If each different type of inflation shows

the same trend as the Inflation (All Items) variable then they will not be useful in the model. A

time plot of the different types of Inflation is shown in figure 6.1.

42

Page 50: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 43

Figure 6.1: A Time plot of the different types of inflation against the time

The time plot shows that the plots for conditional inflation variables closely follow the plots for

the all items variable. This suggests that the variables may be highly correlated. If the variables

are highly correlated it can be said that there is a linear relationship between the variables. A

Pearsons correlation coefficient test has been carried out to see if the different types of inflation

are highly correlated to the total inflation variable. The following hypotheses are tested :

H0 : ρ = 0

H1 : ρ 6= 0

Variable Correlation Coefficient P value

Inflation(All Items) 1 < 0.001

Inflation(Food) 0.784 < 0.001

Inflation(Energy) 0.668 < 0.001

Inflation(Non food / Non services) 0.949 < 0.0001

Table 6.1: Correlation Tests for the Inflation Variables, If P < 0.05 then H0 is rejected

Page 51: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 44

Table 6.1 shows that the three different types of correlation are highly correlated to the total

inflation variable. For this reason only the ’Inflation (all items)’ variable will be considered in the

modelling process. Before any statistical modelling is carried out, it is useful to look at how GDP

and Inflation look when plotted against each other on a time plot. The time plot of GDP and

Inflation against time is shown in Figure 6.2.

Figure 6.2: A Time plot of Inflation and GDP against time

Inflation does not follow the exact same trend as GDP, however there are points in time whereas

GDP has risen, inflation has dropped. This provides an interesting question as to whether inflation

as an explanatory variable will improve this forecasting technique.

6.2 Bivariate GARCH modelling Process

As inflation is the only variable to be modelled against GDP, this process can now be called a

’bivariate GARCH modelling process’. The model has now changed and is now called a bivariate

modelling process. This means that the original models need to be re-fit with inflation as an

Page 52: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 45

explanatory variable. The general model for a bivariate GARCH(p,q) or AR(P)/GARCH(p,q)

model is the same layout as the univariate models(shown in equation 5.1 on page 24, and 5.6 on

page 31 respectively), with an additional explanatory variable term. An example of the bivariate

AR(P)/GARCH(p,q) model is expressed in equation 6.1.

σ2 = α0 + α1(Yt−1 − ηt−1)2 + β1σ

2t−1 + . . .+ α1(Yt−p − ηt−P )2 + β1σ

2t−q + γXt

Yt = ηt + σtεt + δ (6.1)

ηt = φ1Yt−1

εti.i.d∼ N(0, 1)

Where:

γ = The explanatory variable constant.

Xt = The explanatory variable.

The question that is being examined is ’can GDP be accurately predicted using the current Infla-

tion statistic as an explanatory variable’. The inflation variable has been added to both univariate

GARCH models, (GARCH(1,1) and AR(1)/GARCH(1,1). The bivariate GARCH modelling pro-

cess is the same as the final three steps of the GARCH(p,q) and AR(P)/GARCH(p,q) processes

shown on page 24 and 32 respectively. The models will now undergo the modelling process, and

the optimal bivariate GARCH model will be found.

6.2.1 Parameter testing and model identification

P value < or > 0.05 ?

Model Intercept A0 A1 A2 A3 G1 G2 G3 AR1 Keep?

GARCH(1,1) < > < < < < < < < Yes

AR(1)/GARCH(1,1) < > < < < < < < < Yes

Table 6.2: Bivariate GARCH modelling parameter tests [18].

Table 6.2 shows that both models have passed the parameter tests, and therefore both models

will be carried forward to residual analysis.

Page 53: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 46

6.2.2 Residual Analysis

The residuals are now examined to check for any lack of fit. The same criteria as the univariate

model GARCH model is followed (This can be re-visited by looking at table 5.3 on page 27).

Figure 6.3: Bivariate GARCH(1,1) residual analysis output

Plot Criteria

Standardized Residuals There are 7 points(4% of data) outside of the 2 and -2 boundaries.

Histogram of Residuals This shows a bell shaped curve, however this is slightly skewed,

and highly concentraced in the centre

White noise probabilities Approximately 60% of the lags are greater than 0.05 n.

ACF & PACF There are 2 lags outside of the shaded area.

Table 6.3: Bivariate GARCH(1,1) residual analysis

Table 6.3 shows that the Bivariate GARCH(1,1) model does not fit the data perfectly, this is

demonstrated by the normality and white noise tests. The white noise test suggests that it is

likely that the residuals follow a white noise process, however this is not certain. This is an

Page 54: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 47

improvement on the univariate GARCH(1,1) model. This implies that the addition of inflation

has improved the model. Therefore, it follows that Inflation can be used as a good predictor for

GDP.

Figure 6.4: Bivariate GARCH(1,1) residual analysis output

Plot Criteria

Standardized Residuals There are 5 points(2.8% of data) outside of the 2 and -2 bound-

aries.Histogram of Residuals This shows a bell shaped curve.

White noise probabilities Approximately 86% of the lags are greater than 0.05 suggesting

that the residuals may not follow a white noise process.

ACF & PACF There are three lags outside of the shaded area.

Table 6.4: Bivariate AR(1)/GARCH(1,1) residual analysis

The residual results in table 6.4 imply that the Bivariate AR(1)/GARCH(1,1) model fits the model

to a certain degree. There are slight problems with the white noise, and autocorrelation tests, how-

ever these are very minor. This is a very big improvement on the univariate AR(1)/GARCH(1,1)

model, again suggesting that the addition on inflation has improved the model.

Page 55: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 48

6.2.3 Parameter estimation

The parameters for the model have been estimated by SAS and are shown in the table 6.5

Model Notation in model Parameter Estimate

GARCH(1,1) δ 0.852

α0 0.049

α1 0.762

β1 0.407

γ1 −.151AR(1)/GARCH(1,1) δ 0.800

φ1 −.266α0 0.057

α1 0.606

β1 0.474

γ1 −0.126

Table 6.5: Bivariate GARCH Parameter Estimates

The final models for the bivariate GARCH(1,1) and AR(1)/GARCH(1,1) are shown in equations

6.2 and 6.3.

σ2t = 0.049 + 0.762Y 2

t−1 + 0.407σ2t−1 − 0.151Xt−1

Yt = σεt + 0.852 (6.2)

εti.i.d∼ N(0, 1)

σ2 = 0.057 + 0.606(Yt−1 − ηt−1)2 + 0.474σ2

t−1 − 0.126Xt

Yt = ηt + σtεt + 0.800 (6.3)

ηt = −0.266Yt−1

εti.i.d∼ N(0, 1)

Page 56: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 49

6.3 Forecasting GDP by a bivariate GARCH model

The optimal bivariate GARCH model will now be chosen to forecast GDP using inflation as an

explanatory variable. Similar to the univariate GARCH process, none of the models show a perfect

fit to the data. However, the residual analysis suggests that the AR (1)/GARCH(1,1) has the

better fit. The AIC and SBC can now be considered to confirm or disprove this conclusion.

Model AIC SBC

GARCH(1,1) 425.489 441.312

AR(1)/GARCH(1,1) 421.007 439.996

Table 6.6: AIC and SBC of Bivariate GARCH models

The AIC and SBC results confirm the suspicion that the AR (1)/GARCH(1,1) model is the model

with the better fit. This model is now used to forecast GDP. The same rules as the univariate

GARCH model are present, and it is only accurate to forecast one step ahead. A forecast can

only be produced if the current value for inflation is available. The source [27] has been revisited,

the value for inflation Q-4 2013 is now available, and stands at 0.8. This value has been input

into SAS, and the forecast for Q4-2013 using this model is 0.73418. Similar to the univariate

GARCH forecast, this vale is a lot closer to the actual GDP value (0.7208) for this quarter. A

plot of predicted verses actual is shown in figure 6.5; this demonstrates how accurate the model

is at forecasting.

Page 57: Natalie Fuller Thesis

CHAPTER 6. MULTIVARIATE GARCH MODELLING 50

Figure 6.5: AR(1)/GARCH(1,1) forecast plot : The red ’Predicted’ series shows the value of GDP

that the bivariate AR(1)/GARCH(1,1) model would have predicted at time point t.

Figure 6.5 shows the predicted values follow a similar trend to the actual values for GDP. The

variances in the predicted values are not as extreme as the actual values suggesting that the model

does not account for large changes in GDP. This implies that the model would be accurate to

predict GDP for the next quarter, as long as there is nothing drastic that may affect the economic

environment (e.g. the war, and the banking crisis).

Page 58: Natalie Fuller Thesis

Chapter 7

Results

Forecasts for GDP have been created via three different forecasting methods

• Univariate Box Jenkins Modelling

• Univariate GARCH Modelling

• Multivariate GARCH Modelling

An optimal model has been found for each forecasting method. The results and forecasts from

these models will now be compared, and the model that fits the GDP time series best will be

chosen and analysed.

7.1 Optimal model

The main aim of the statistical model is to have the ability to forecast the future value of GDP

using a statistical model. For this reason, the first thing to be considered when choosing the

optimal model will be its ability to forecast.

The actual vs predicted plots for the univariate ARMA(1,1) (Figure 4.4 page 21), univariate

AR(1)/GARCH(1,1) (Figure 5.8 page 40), and bivariate AR(1)/GARCH(1,1) (Figure 6.5 page

50) models are re-visited. Comparing these plots it is clear that the forecast line for the ARMA(1,1)

51

Page 59: Natalie Fuller Thesis

CHAPTER 7. RESULTS 52

model does not follow the real value for the percentage change in GDP as well as the GARCH

models do. For this reason, and the fact that heteroscadasticity has been shown to be present a

GARCH type model will be chosen as the optimal model.

Model Forecast AIC SBC MSE

Univariate

AR(1)/GARCH(1,1)

0.730 424.820 440.640 0.992

Bivariate

AR(1)/GARCH(1,1)

0.734 421.000 439.960 0.918

Table 7.1: Final model statistics to be used for a comparison between the two GARCH models.

Statistics for both GARCH models are shwown in table 7.1[28]. When transitioning between

univariate and bivariate GARCH, it was noticed that adding in the explanatory variable improved

the results from the residual output. This highly suggests that the bivariate model is optimal.

It can be seen in table 7.1 that the AIC, SBC, and MSE for the bivariate model are lower by 0.89%,

0.15% and 7.45% respectively. As previously mentioned in chapter 5 on page 36, when deciding

which model to choose, the model with the lower AIC and SBC will be chosen. The MSE value

shows the average of the squared error where the error is defined as the actual minus the forecast

(Yt− Yt). The bivariate model has a lower MSE suggesting that this model forecasted more accu-

rately thoughout all time points. For the reasons stated above, the bivariate AR(1)/GARCH(1,1)

model will be chosen at the optimal model to forecast GDP in this investigation.

7.2 Bivariate AR(1)/GARCH(1,1) model analysis

The model for the bi-variate AR(1)/GARCH(1,1) is shown in equation 7.1.

σ2 = 0.057 + 0.606(Yt−1 − ηt−1)2 + 0.474σ2

t−1 − 0.126Xt

Yt = ηt + σtεt + 0.799 (7.1)

ηt = −0.266Yt−1

εti.i.d∼ N(0, 1)

Page 60: Natalie Fuller Thesis

CHAPTER 7. RESULTS 53

The actual vs predicted plot (Shown in figure 6.5 page 50) for this model shows the model does

well in forecasting the future GDP. However, there are a few points in time where the volatility of

GDP was so extreme that the model could not forecast for this. In this model the values for α1

and β are 0.6064 and 0.4738 respectively. The value of α1+β1 is greater than 1. This shows the

series to be an explosive series. The ideal GARCH Model will not have α1 + β1 > 1, suggesting

that this model may not be trusted. The fact that α1+β1 is greater than would account for the

fact that the model is unable to forecast for extreme changes in variance.

At Q4-2013 the real value of the percentage change in GDP is 0.720 and the forecast value for

this time point is 0.734. The forecast error is -0.016, this is particularly low and shows that at this

time point the model is reliable in forecasting GDP. The mean squared error (MSE) of the model

is 0.918. The MSE is a lot higher than the error2 for this particular forecast. This is because

thoughout the time series in a few cases, the model could not forecast for an extreme change in

variance. Figure 7.1 shows a plot of the squared errors at each time point, and the mean squared

error of the time series against time.

Figure 7.1: A Time plot of the squared errors using the AR(1)/GARCH(1,1) against time.

It is clear from figure 7.1 that there are some extremely large squared errors at dates Q1-1973,

Page 61: Natalie Fuller Thesis

CHAPTER 7. RESULTS 54

Q1-1974, Q2-Q3-1979, and Q1-2009. At these dates, the forecast was dramatically different to

the actual value causing a large error term. These extreme cases are shown in table7.2.

Date Actual Forecast Error Error squared

Q1-1973 5.276 0.913 4.362 19.028

Q1-1973 −2.412 0.153 −2.566 6.585

Q1-1979 4.3068 0.011 4.295 18.447

Q2-1979 −2.330 1.01 −3.341 11.165

Q1-2009 −2.468 0.070 −2.538 6.444

Table 7.2: Error squared values for extreme changes in variance.

[28]

Table 7.2 show that the squared errors at the time point of thes extreme changes in variance range

between 6.4 and 19.0. These values are dramatically different from the MSE, therefore making

the MSE dramatically larger than it could be. This suggests that the extreme changes in GDP

are skewing the value for the MSE. As previously mentioned on page 52, the lower the MSE, the

more accurate the forecasts are. It might be suggested that if there is sufficient evidence of an

unusual event that may have affected the economic climate dramatically, then these time points

be taken out of the model. This would then calculate a more accurate MSE, and thus creating a

better model.

Once a model has been fit to the data, a total R2 value is assigned to the model. The total R2

value shows how accurate the model is in forecasting the data. The closer the R2 value to 1 the

better predictor the model is. The R2 for this model is 0.061, this suggests that the model is not

a very good predictor. However, similar to the problem with the MSE, this value may be being

affected by the extreme changes in velocity, and therefore this could be an incorrect representation

of the model. This model has been shown to fit the data; however there are some reservations

about how accurate the forecasts will be in the future.

Page 62: Natalie Fuller Thesis

Chapter 8

Conclusion

The topic of this paper has been an investigation into the ability to forecast GDP growth rates

using statistical models. GDP has been modelled between the period of Q1-1970 and Q3-2014.

In the univariate section of this project, the Box-Jenkins ARIMA modelling process was compared

to the GARCH modelling process. Both processes use different modelling techniques to calculate

a forecast for GDP in quarter 4 2013. The optimal model for the ARIMA modelling process was

found to the be the ARMA(1,1).

ARMA models assume constant volatility, it was suggested that the volatility of this time series

may not be constant. A test for heteroscedasticity was carried out, and this lead onto the

consideration of the GARCH model. The GARCH modelling process was carried out, and it was

found that there is no GARCH(p,q) model that fits the data well. A technique was applied to merge

the AR part of ARIMA model with the GARCH model, this then created an AR(P),GARCH(p,q)

model. A model was found by finding the optimal AR model, and combining this with the GARCH

model. After carrying out the parameter tests, residual analysis, and considering the AIC and SBC

the AR(1),GARCH(1,1) model was found to be the optimal univariate GARCH type model.

In the next chapter the explanatory variable ’inflation’ was introduced. The AIC and SBC method

was used to choose which forecasting technique was to be expanded to include the explanatory

variable. The GARCH modelling process was chosen as it was shown that heteroscedasticity

is present in the series, and the AIC and SBC values are lower for this type model. Inflation

55

Page 63: Natalie Fuller Thesis

CHAPTER 8. CONCLUSION 56

was introduced to create a bivariate GARCH model. It was shown that the addition of Inflation

improved the models substantially, and the optimal bivariate GARCH model was found to be the

AR(1),GARCH(1,1).

The three modelling techniques were compared using the ’forecast vs actual’ plots, and the model

statistics. The optimal model to forecast the percentage change in GDP in the UK was found

to be the bivariate AR(1)/GARCH(1,1). This model’s ability to forecast the data was examined

and this exposed a few issues with extreme changes in GDP. The R2 value, and MSE value were

considered, and these suggested that the model does not forecast very accurately. It was proposed

that the R2, and MSE values may have been skewed by the extreme changed in GDP.

The AR(1)/GARCH(1,1) model has a MSE value of 0.918 The literature review showed that

similar research was carried out on UK GDP growth, and inflation by the Bank Of England [5].

The Bank Of England found the FVR-VAR model to be optimal with a RMSE of 0.680, which

means their MSE is at a value of 0.460. This value is 53.06% lower that the MSE found for the

AR(1)/GARCH(1,1). The data used is not exactly identical, however these results wive a good

indication that it may be interesting to model a VAR type model to this data using Inflation as a

second variable.

8.1 Suggestions to improve upon this investigation

This investigation shows that there are areas for improvement to the forecasting techniques. If

more time and resources were available I would suggest the following additions for improvements

to this investigation.

• If there is sufficient reason that an extreme change in GDP has occurred due to a drastic

change in the economic environment, this data point should be levelled out to the central

value between the previous point, and the next point.

• Different types of GARCH models should be fitted to the data. There are 8 types of GARCH

model that differ from the original GARCH model. These are shown in table 8.1.

Page 64: Natalie Fuller Thesis

CHAPTER 8. CONCLUSION 57

Type Application

NGARCH Non- Linear

IGARCH Integrated

EGARCH Exponential

GARCH-M Mean equation

QGARCH Quadratic

GJP-GARCH Glosten-Chris Hughton-Runkle

TCARCH Threshold

FGCARCH Family

Table 8.1: Different types of GARCH model

It would be suggested that the non-linear and quadratic GARCH models be eliminated from

the investigation. This is because the time series is shown to be linear, and the quadratic

GARCH models for symmetric effects of positive and negative shocks. This time series has

been shown to have different values for the positive and negative GDP values.

• Combine ARMA type models with GARCH models to create ARMA(P,Q)GARCH(p,q) mod-

els. Investigate into whether there are any models of different orders that create different

or more accurate forecasts.

• Explore into the effects of different explanatory variables, in particular the return on GBP

exchange rates may provide an interesting discussion.

• Investigate into VAR type models, in particular the FAVAR model.

Page 65: Natalie Fuller Thesis

References

[1] Dominic Sandbrook. Worst of times, best of times. http://www.economist.com/node/

17090761. Accessed: 23/03/2014.

[2] Tejvan Pettinger. The Economy of the 1970s. http://econ.economicshelp.org/2010/

02/economy-of-1970s.html. Accessed: 23/03/2014.

[3] Oscar Dejuan, Eladio Febrero, and Maria Cristina Marcuzzo. The First Great Recession of

the 21st Century: Competing Explanations. Edward Elgar Publishing, 2011.

[4] Jonathan Portes. About Us. http://niesr.ac.uk/about-us. Accessed: 11/02/2014.

[5] Haroon Mumtaz Alina Barnett and Konstantinos Theodoridis. Working Paper No. 450 Fore-

casting UK GDP growth, inflation and interest rates under structural change a comparison

of models with time-varying parameters. http://www.bankofengland.co.uk/research/

Documents/workingpapers/2012/wp450.pdf. Accessed:25/03/2014.

[6] Tilak Abeysinghe and Gulasekaran Rajaguru. Quarterly Real GDP Estimates for China and

ASEAN4 with a Forecast Evaluation. Journal of Forecasting - Wiley Online Library, 2004.

[7] Elena-Adriana ANDREI and Elena BUGUDUI. Econometric Modeling of GDP Time Series.

Theoretical and Applied Economics Volume XVIII, No. 10(563), pp. 91-98, 2011.

[8] WenShwo Fang. Modeling the Volatility of Real GDP Growth: The Case of Japan Revis-

ited. http://digitalcommons.uconn.edu/cgi/viewcontent.cgi?article=1386&

context=econ_wpapers. Accessed:22/04/2014. 2008.

[9] OECD.STAT. Gross Domestic Product - expenditure approach:growth rate compared to

previous quarter. http://stats.oecd.org/index.aspx?queryid=350. Accessed:

07/10/2013.

[10] Irvin B. Tucker. Macroeconomics for Today. Page 137. Cengage Learning; 7 edition, 2010.

58

Page 66: Natalie Fuller Thesis

REFERENCES 59

[11] G.P Nason. Stationary and non-stationary time series. http://www.cas.usf.edu/

~cconnor/geolsoc/html/chapter11.pdf. Page 3.

[12] Natalie Fuller and Myra Wiseman. Level 6 Time Series and Forecasting Lecture Notes.

Lecture 1 Page 2.

[13] Bruce Bowerman and Richard O’Connell. Forecasting and Time series an applied approach.

page 355. Suxbury Pr3 sub edition, 2003.

[14] Bruce Bowerman and Richard O’Connell. Forecasting and Time series an applied approach.

page 467. Suxbury Pr3 sub edition, 2003.

[15] Natalie Fuller and Myra Wiseman. Level 6 Time Series and Forecasting Lecture Notes.

Lecture Notes 2 Page 12.

[16] David Ruppert. Statistics and Data Analysis for Financial Engineering. page 235. Springer;

2011 edition, 2010.

[17] NIST/SEMATECH e-Handbook of Statistical Methods. http://www.itl.nist.gov/

div898/handbook/eda/section3/autocopl.htm. Accessed:23/03/2014. 2012.

[18] Natalie Fuller. A.3 Parameter P values. Disk Attatched - Tab 5. 2014.

[19] Terry J Walsham and Keith Parramore. Quantitative methods in Finance. page 257. Cengage

South-Western, 1996.

[20] John C. Brocklebank and David A. Dickey. SAS for forecasting time series. page 249. WA

(Wiley-SAS); 2 edition, 2003.

[21] Chris Brooks. Introductory Econometrics for Finance. page 394. Cambridge University Press;

2 edition, 2008.

[22] SAS Institute. SAS/Ets 12.1 User’s Guide. page 396. SAS Institute Release 12.1, 2012.

[23] Elzbieta Ferenstein and Miroslaw Gasowskie. Modelling Stock Returns AR-GARCH. http:

//www.idescat.cat/sort/sort281/ferestein.pdf. Accessed:25/02/2014.

[24] Professor Kerry Patterson. Unit Root Tests in Time Series Volume 2: Extensions and De-

velopments. Palgrave Macmillan, 2012.

[25] David Ruppert. Statistics and Finance: An Introduction. page 125. Springer; Corr. 2nd

printing, 2006.

[26] Christian Francq and Jean-Michel Zakoian. GARCH Models: Structure, Statistical Inference

and Financial Applications. Wiley-Blackwell, 2010.

Page 67: Natalie Fuller Thesis

REFERENCES 60

[27] OECD.STAT. Consumer prices - percentage change from previous period. http://stats.

oecd.org/index.aspx?queryid=26661. Accessed: 07/10/2013.

[28] Natalie Fuller. A.2 MSE Calculations. Disk Attatched - Tab 3/4. 2014.

Page 68: Natalie Fuller Thesis

Appendix

A.1 UK Data

Please see disk attached : Tab 1

A.2 MSE Calculations

Please see disk attached: Tab 3/4

A.3 Parameter P values

Please see disk attached: Tab 5

61

Page 69: Natalie Fuller Thesis

APPENDIX 62

A.4 SAS code

Page 70: Natalie Fuller Thesis

APPENDIX 63

Page 71: Natalie Fuller Thesis

APPENDIX 64

Page 72: Natalie Fuller Thesis

APPENDIX 65

Page 73: Natalie Fuller Thesis

APPENDIX 66