Analysis of time series data in the time domain is done with this procedure. Box-Jenkins methodology (the fitting of ARIMA models to time series data) and also transfer function (input type) models can be used. Frequency domain analysis of time series can be done using Proc Spectra.
The framework for the analysis is that the observed time series X(t) is stationary and satisfies an ARMA equation of the form
X(t) -phi(1) X(t-1) - ... -phi(p) X(t-p)=
Z(t) -theta(1)Z(t-1)-...-theta(q) Z(t-q)
where Z(t) is a white noise process. The constants phi(1),...,
phi(p) are called the autoregressive coefficients and the number
p is called the order of the autoregressive component. The
constants theta(1),..., theta(q) are called the moving average
coefficients and the number q is called the order of the moving
average component. It is possible for either p or q to be
zero.
Use of proc arima to fit ARMA models consists of 3 steps. The first step is model identification, in which the observed series is transformed to be stationary. The only transformation available within proc arima is differencing. The second step is model estimation, in which the orders p and q are selected and the corresponding parameters are estimated. The third step is forecasting, in which the estimated model is used to forecast future values of the observable time series.
As an example, the data file milk.dat containing data on milk production taken from Cryer will be analyzed. Here are the commands that could be used for each of the 3 steps.
proc arima data=milk;
identify var=milk(12)
nlag=30
center
outcov=milkcov
noprint;
run;
estimate p=1 q=3
nodf
noconstant
method=ml
plot;
run;
forecast
lead=10
out=predict
printall;
run;
OPTIONS FOR THE IDENTIFY STATEMENT:
The var= statement is required and specifies the variable(s) in
the data set to be analyzed. The optional numbers in
parenthesis specify the LAG at which differences are to be
computed. A statement var=milk would analyze the milk series
without any differencing; var=milk(1) would analyze the first
difference of milk; var=milk(1,1) the second difference of milk.
The var= statement produces 3 plots for the specified
variable: the sample autocorrelation function, the
sample inverse autocorrelation function, and the sample partial
autocorrelation function. These crude plots and tables of their
values are printed in the output window. Higher quality plots
can be produced through the use of other options (detailed
below) and proc gplot.
The nlag= option causes the 3 plots to print values up to lag 30.
If not specified, the default is nlag=24 or 25% of the number of
observations, whichever is less.
The center option subtracts the average of the series specified
by the var= statement. The average is added back in
automatically during the forecast step.
The outcov= option places the values of the sample correlation
functions into a SAS data set. These values can be used to
produce high quality plots of these functions using
proc gplot. The variables output are: LAG, VAR (name of
the varible specified in the var= option), CROSSVAR (name of the
variable specified in the crosscorr= option), N (number of
observations used to compute the current value of the covariance
or crosscovariance), COV (value of the cross covariances),
CORR (value of the sample autocorrelation function), STDERR (standard
error of the autocorrelations), INVCORR (values of the sample
inverse autocorrelation function), and PARTCORR (values of the
sample partial autocorrelation function).
The noprint option suppresses the output of the low quality
graphs normally created by the var= statement. This option is
used primarily with the outcov= option.
OPTIONS FOR THE ESTIMATE STATEMENT:
The p=1 q=3 options specify the auto-regressive and moving average
orders to be fit. Other forms of these specifications are: q=(3) to
specify that ONLY the parameter theta(3) is allowed to be non-zero;
p=(12)(3) for a seasonal model (1-phi(12)B**12)(1-phi(3)B**3) where
B is the backshift operator; p=(3,12) for a model in which only
phi(3) and phi(12) are allowed to be non-zero.
The nodf option uses the sample size rather than the degrees of
freedom as the divisor when estimating the white noise variance.
The method option selects the estimation method for the parameters.
The choices are ml for maximum (Gaussian) likelihood estimation,
uls for unconditional least squares, and cls for conditional least
squares.
The plot option produces the same 3 plots as in the identify
statement for the RESIDUALS after the model parameters are
estimated. This is another useful check on whiteness of the
residuals.
OPTIONS FOR THE FORECAST STATEMENT:
The lead option specifies the number of time intervals into the
future for which forecasts are to be made.
By using the out= and printall options in the forecast
statement, a SAS dataset will be created which will
contain the values of the original series and the predicted values
of the series using the model at all times. This can be useful for
an analysis of the past performance of the model.
In practice, several different estimate statements are tried sequentially to see which model best fits the data. Proc arima is interactive, in the sense these sequential attempts can be made without restarting the procedure. Simply submit the successive estimate statements; the original identify statement will be retained.
Transfer function models can be fit by using the crosscorr
option of the identify statement and the input option of the
estimate statement. The mechanics of this procedure are
illustrated for a dataset fake which contains two time
series which are related by a transfer function model. In this
case, Y depends on X. First, the process X is modeled using the
identify and estimate statements. Then Y is identified and the
cross-correlation between the prewhitened processes X and Y is
estimated. The program might look like this.
proc arima data=fake;
identify var=x
center
nlag=40;
estimate p=1 q=1
noconstant
nodf
method=ml
plot;
identify var=y
center
nlag=40
crosscorr=(x);
run;
From the cross correlation information, the lags at which the input
process X influences Y can be tentatively identified. Note that only
causal models are allowed; non-zero cross correlations at negative
lags cannot be modeled in proc arima. For illustration, say the
non-zero lags are 2 and 4. The process Y might be estimated as
follows.
estimate input=( 2$(2) x )
p=1 q=3
noconstant
nodf
method=ml
plot;
run;
The input is of the form cB**2+ dB**4= B**2( c + dB**2). It is this
latter form that gives the form of the input statement.
Note that the estimate statement always refers to the most recent identify statement to decide what variable(s) are to be included in the model. Thus differencing and centering are handled automatically (if used) EXCEPT that differencing must be explicitly specified in the crosscorr statement.
For further details see the online help under SAS SYSTEM HELP--MODELING & ANALYSIS TOOLS--ECONOMETRICS & TIME SERIES--ARIMA or the SAS/ETS Guide.
Copyright © 1997 by Jerry Alan Veeh. All rights reserved.