Analysis of time series data in the time domain is done with this procedure. Box-Jenkins methodology (the fitting of ARIMA models to time series data) and also transfer function (input type) models can be used. Frequency domain analysis of time series can be done using Proc Spectra.
The framework for the analysis is that the observed time series X(t) is stationary and satisfies an ARMA equation of the form
X(t) -phi(1) X(t-1) - ... -phi(p) X(t-p)= Z(t) -theta(1)Z(t-1)-...-theta(q) Z(t-q)
Z(t) is a white noise process. The constants
phi(p) are called the autoregressive coefficients and the number
p is called the order of the autoregressive component. The
theta(q) are called the moving average
coefficients and the number
q is called the order of the moving
average component. It is possible for either
q to be
Use of proc arima to fit ARMA models consists of 3 steps. The first step is model identification, in which the observed series is transformed to be stationary. The only transformation available within proc arima is differencing. The second step is model estimation, in which the orders p and q are selected and the corresponding parameters are estimated. The third step is forecasting, in which the estimated model is used to forecast future values of the observable time series.
As an example, the data file milk.dat containing data on milk production taken from Cryer will be analyzed. Here are the commands that could be used for each of the 3 steps.
proc arima data=milk; identify var=milk(12) nlag=30 center outcov=milkcov noprint; run; estimate p=1 q=3 nodf noconstant method=ml plot; run; forecast lead=10 out=predict printall; run;
OPTIONS FOR THE IDENTIFY STATEMENT:
var= statement is required and specifies the variable(s) in
the data set to be analyzed. The optional numbers in
parenthesis specify the LAG at which differences are to be
computed. A statement
var=milk would analyze the milk series
without any differencing;
var=milk(1) would analyze the first
difference of milk;
var=milk(1,1) the second difference of milk.
var= statement produces 3 plots for the specified
variable: the sample autocorrelation function, the
sample inverse autocorrelation function, and the sample partial
autocorrelation function. These crude plots and tables of their
values are printed in the output window. Higher quality plots
can be produced through the use of other options (detailed
below) and proc gplot.
nlag= option causes the 3 plots to print values up to lag 30.
If not specified, the default is
nlag=24 or 25% of the number of
observations, whichever is less.
center option subtracts the average of the series specified
var= statement. The average is added back in
automatically during the forecast step.
outcov= option places the values of the sample correlation
functions into a SAS data set. These values can be used to
produce high quality plots of these functions using
proc gplot. The variables output are:
VAR (name of
the varible specified in the
CROSSVAR (name of the
variable specified in the
N (number of
observations used to compute the current value of the covariance
COV (value of the cross covariances),
CORR (value of the sample autocorrelation function),
error of the autocorrelations),
INVCORR (values of the sample
inverse autocorrelation function), and
PARTCORR (values of the
sample partial autocorrelation function).
noprint option suppresses the output of the low quality
graphs normally created by the
var= statement. This option is
used primarily with the
OPTIONS FOR THE ESTIMATE STATEMENT:
p=1 q=3 options specify the auto-regressive and moving average
orders to be fit. Other forms of these specifications are:
specify that ONLY the parameter
theta(3) is allowed to be non-zero;
p=(12)(3) for a seasonal model
B is the backshift operator;
p=(3,12) for a model in which only
phi(12) are allowed to be non-zero.
nodf option uses the sample size rather than the degrees of
freedom as the divisor when estimating the white noise variance.
method option selects the estimation method for the parameters.
The choices are
ml for maximum (Gaussian) likelihood estimation,
uls for unconditional least squares, and
cls for conditional least
plot option produces the same 3 plots as in the identify
statement for the RESIDUALS after the model parameters are
estimated. This is another useful check on whiteness of the
OPTIONS FOR THE FORECAST STATEMENT:
lead option specifies the number of time intervals into the
future for which forecasts are to be made.
By using the
printall options in the forecast
statement, a SAS dataset will be created which will
contain the values of the original series and the predicted values
of the series using the model at all times. This can be useful for
an analysis of the past performance of the model.
In practice, several different estimate statements are tried sequentially to see which model best fits the data. Proc arima is interactive, in the sense these sequential attempts can be made without restarting the procedure. Simply submit the successive estimate statements; the original identify statement will be retained.
Transfer function models can be fit by using the
option of the identify statement and the
input option of the
estimate statement. The mechanics of this procedure are
illustrated for a dataset
fake which contains two time
series which are related by a transfer function model. In this
Y depends on
X. First, the process
X is modeled using the
identify and estimate statements. Then
Y is identified and the
cross-correlation between the prewhitened processes
estimated. The program might look like this.
proc arima data=fake; identify var=x center nlag=40; estimate p=1 q=1 noconstant nodf method=ml plot; identify var=y center nlag=40 crosscorr=(x); run;
From the cross correlation information, the lags at which the input
Y can be tentatively identified. Note that only
causal models are allowed; non-zero cross correlations at negative
lags cannot be modeled in proc arima. For illustration, say the
non-zero lags are 2 and 4. The process
Y might be estimated as
estimate input=( 2$(2) x ) p=1 q=3 noconstant nodf method=ml plot; run;
The input is of the form
cB**2+ dB**4= B**2( c + dB**2). It is this
latter form that gives the form of the input statement.
Note that the estimate statement always refers to the most recent identify statement to decide what variable(s) are to be included in the model. Thus differencing and centering are handled automatically (if used) EXCEPT that differencing must be explicitly specified in the crosscorr statement.
For further details see the online help under SAS SYSTEM HELP--MODELING & ANALYSIS TOOLS--ECONOMETRICS & TIME SERIES--ARIMA or the SAS/ETS Guide.
Copyright © 1997 by Jerry Alan Veeh. All rights reserved.