Proc Reg

Contents Index


The proc reg procedure is used to perform regression analysis.

Proc GLM can also be used to do this analysis by leaving the quantitative variables out of the class statement. In some ways, proc glm is superior to proc reg because proc glm allows manipulations in the model statement (such as x*x to obtain quadratic factors) which are not allowed in proc reg. However, proc reg allows certain automatic model selection features and a crude plotting feature not available in proc glm.

The variables analyzed using proc reg must be numeric variables all of which appear in a SAS data set. If x, y, and z are 3 numeric variables the basic invocation is

proc reg data=stuff;
	model z= x y;
run;

There are many options available in the model statement. As in proc glm, the options are listed after a backslash on the model statement line. One example is

proc reg data=stuff;
	model z= x y /
	noint 
	selection=stepwise
	sle=.05
	sls=.05;
run;

The noint option specifies that the fitted model is to have NO intercept (constant) term.

The selection= option specifies how variables are to be introduced into the model. The default (if selection= is not used) is equivalent to selection=none, in which all the variables in the model statement are used. Setting selection=stepwise introduces a variable into the model provided it is significant at the sle level and deletes a variable from the model if it is NOT significant at the sls level. Setting selection=rsquare selects the model which has the maximum value of the square of R.

A final important option in proc reg is the output statement. This statement, which must follow the model statement, creates a SAS data set containing the variables in the original data set together with new variables as specified in the output statement. An illustration of some of the common options is

output 
	out=results
	predicted=pred
	residual=resid
	L95M=lowmean
	U95M=highmean
	L95=lowpred
	U95=highpred;

The out= option gives the name of the new SAS dataset.

The predicited= option gives the name of the variable in the out= data set which contains the predicted value of the dependent variable. By adding records to the original data set which specify values of the INDEPENDENT variables in the model but set the corresponding value of the DEPENDENT variable to missing, one can obtain predictions given by the model for unobserved settings of the independent variables.

The residual= option gives the name of the variable in the out= data set which contains the value of the residual.

The L95M= and U95M= options give the names of the variables in the out= data set which contain the lower and upper endpoints of a 95% confidence interval for the mean.

The L95= and U95= options give the names of the variables in the out= data set which contain the lower and upper endpoints of a 95% confidence interval for the predicted value.

For further information see the SAS/STAT User's Guide, volume 2.


Contents Index

Copyright © 1997 by Jerry Alan Veeh. All rights reserved.