Proc Arima

Contents Index


Analysis of time series data in the time domain is done with this procedure. Box-Jenkins methodology (the fitting of ARIMA models to time series data) and also transfer function (input type) models can be used. Frequency domain analysis of time series can be done using Proc Spectra.

The framework for the analysis is that the observed time series X(t) is stationary and satisfies an ARMA equation of the form

X(t) -phi(1) X(t-1) - ... -phi(p) X(t-p)=
Z(t) -theta(1)Z(t-1)-...-theta(q) Z(t-q)

where Z(t) is a white noise process. The constants phi(1),..., phi(p) are called the autoregressive coefficients and the number p is called the order of the autoregressive component. The constants theta(1),..., theta(q) are called the moving average coefficients and the number q is called the order of the moving average component. It is possible for either p or q to be zero.

Use of proc arima to fit ARMA models consists of 3 steps. The first step is model identification, in which the observed series is transformed to be stationary. The only transformation available within proc arima is differencing. The second step is model estimation, in which the orders p and q are selected and the corresponding parameters are estimated. The third step is forecasting, in which the estimated model is used to forecast future values of the observable time series.

As an example, the data file milk.dat containing data on milk production taken from Cryer will be analyzed. Here are the commands that could be used for each of the 3 steps.

proc arima data=milk;
identify var=milk(12) 
	nlag=30 
	center
	outcov=milkcov
	noprint;
run;
estimate p=1 q=3
	nodf
	noconstant
	method=ml
	plot;
run;
forecast
	lead=10
	out=predict
	printall;
run;

OPTIONS FOR THE IDENTIFY STATEMENT:

The var= statement is required and specifies the variable(s) in the data set to be analyzed. The optional numbers in parenthesis specify the LAG at which differences are to be computed. A statement var=milk would analyze the milk series without any differencing; var=milk(1) would analyze the first difference of milk; var=milk(1,1) the second difference of milk.

The var= statement produces 3 plots for the specified variable: the sample autocorrelation function, the sample inverse autocorrelation function, and the sample partial autocorrelation function. These crude plots and tables of their values are printed in the output window. Higher quality plots can be produced through the use of other options (detailed below) and proc gplot.

The nlag= option causes the 3 plots to print values up to lag 30. If not specified, the default is nlag=24 or 25% of the number of observations, whichever is less.

The center option subtracts the average of the series specified by the var= statement. The average is added back in automatically during the forecast step.

The outcov= option places the values of the sample correlation functions into a SAS data set. These values can be used to produce high quality plots of these functions using proc gplot. The variables output are: LAG, VAR (name of the varible specified in the var= option), CROSSVAR (name of the variable specified in the crosscorr= option), N (number of observations used to compute the current value of the covariance or crosscovariance), COV (value of the cross covariances), CORR (value of the sample autocorrelation function), STDERR (standard error of the autocorrelations), INVCORR (values of the sample inverse autocorrelation function), and PARTCORR (values of the sample partial autocorrelation function).

The noprint option suppresses the output of the low quality graphs normally created by the var= statement. This option is used primarily with the outcov= option.

OPTIONS FOR THE ESTIMATE STATEMENT:

The p=1 q=3 options specify the auto-regressive and moving average orders to be fit. Other forms of these specifications are: q=(3) to specify that ONLY the parameter theta(3) is allowed to be non-zero; p=(12)(3) for a seasonal model (1-phi(12)B**12)(1-phi(3)B**3) where B is the backshift operator; p=(3,12) for a model in which only phi(3) and phi(12) are allowed to be non-zero.

The nodf option uses the sample size rather than the degrees of freedom as the divisor when estimating the white noise variance.

The method option selects the estimation method for the parameters. The choices are ml for maximum (Gaussian) likelihood estimation, uls for unconditional least squares, and cls for conditional least squares.

The plot option produces the same 3 plots as in the identify statement for the RESIDUALS after the model parameters are estimated. This is another useful check on whiteness of the residuals.

OPTIONS FOR THE FORECAST STATEMENT:

The lead option specifies the number of time intervals into the future for which forecasts are to be made.

By using the out= and printall options in the forecast statement, a SAS dataset will be created which will contain the values of the original series and the predicted values of the series using the model at all times. This can be useful for an analysis of the past performance of the model.

In practice, several different estimate statements are tried sequentially to see which model best fits the data. Proc arima is interactive, in the sense these sequential attempts can be made without restarting the procedure. Simply submit the successive estimate statements; the original identify statement will be retained.

Transfer function models can be fit by using the crosscorr option of the identify statement and the input option of the estimate statement. The mechanics of this procedure are illustrated for a dataset fake which contains two time series which are related by a transfer function model. In this case, Y depends on X. First, the process X is modeled using the identify and estimate statements. Then Y is identified and the cross-correlation between the prewhitened processes X and Y is estimated. The program might look like this.

proc arima data=fake;
identify var=x 
	center 
	nlag=40;
estimate p=1 q=1 
	noconstant 
	nodf 
	method=ml 
	plot;
identify var=y 
	center 
	nlag=40 
	crosscorr=(x);
run;

From the cross correlation information, the lags at which the input process X influences Y can be tentatively identified. Note that only causal models are allowed; non-zero cross correlations at negative lags cannot be modeled in proc arima. For illustration, say the non-zero lags are 2 and 4. The process Y might be estimated as follows.

estimate input=( 2$(2) x ) 
	p=1 q=3 
	noconstant 
	nodf 
	method=ml 
	plot;
run;

The input is of the form cB**2+ dB**4= B**2( c + dB**2). It is this latter form that gives the form of the input statement.

Note that the estimate statement always refers to the most recent identify statement to decide what variable(s) are to be included in the model. Thus differencing and centering are handled automatically (if used) EXCEPT that differencing must be explicitly specified in the crosscorr statement.

For further details see the online help under SAS SYSTEM HELP--MODELING & ANALYSIS TOOLS--ECONOMETRICS & TIME SERIES--ARIMA or the SAS/ETS Guide.


Contents Index

Copyright © 1997 by Jerry Alan Veeh. All rights reserved.