The proc reg procedure is used to perform regression analysis.
Proc GLM can also be used to do this analysis by leaving
the quantitative variables out of the class statement. In some
ways, proc glm is superior to proc reg because proc glm allows
manipulations in the model statement (such as x*x to obtain
quadratic factors) which are not allowed in proc reg. However,
proc reg allows certain automatic model selection features and
a crude plotting feature not available in proc glm.
The variables analyzed using proc reg must be numeric variables
all of which appear in a SAS data set. If x,
y, and z are 3 numeric variables the basic invocation is
proc reg data=stuff;
model z= x y;
run;
There are many options available in the model statement. As in proc glm, the options are listed after a backslash on the model statement line. One example is
proc reg data=stuff;
model z= x y /
noint
selection=stepwise
sle=.05
sls=.05;
run;
The noint option specifies that the fitted model is to have NO
intercept (constant) term.
The selection= option specifies how variables are to be introduced
into the model. The default (if selection= is not used) is equivalent to
selection=none, in which all the variables in the model statement
are used. Setting selection=stepwise introduces a variable into the
model provided it is significant at the sle level and deletes a
variable from the model if it is NOT significant at the sls level.
Setting selection=rsquare selects the model which has the maximum
value of the square of R.
A final important option in proc reg is the output statement.
This statement, which must follow the model statement, creates a
SAS data set containing the variables in the original data set
together with new variables as specified in the output
statement. An illustration of some of the common options is
output
out=results
predicted=pred
residual=resid
L95M=lowmean
U95M=highmean
L95=lowpred
U95=highpred;
The out= option gives the name of the new SAS dataset.
The predicited= option gives the name of the variable in the
out= data set which contains the predicted value of the
dependent variable. By adding records to the original data set
which specify values of the INDEPENDENT
variables in the model but set the corresponding value of the
DEPENDENT variable to missing, one can obtain
predictions given by the model for unobserved settings of the
independent variables.
The residual= option gives the name of the variable in the
out= data set which contains the value of the residual.
The L95M= and U95M= options give the names of the variables in
the out= data set which contain the lower and upper endpoints of
a 95% confidence interval for the mean.
The L95= and U95= options give the names of the variables in
the out= data set which contain the lower and upper endpoints of
a 95% confidence interval for the predicted value.
For further information see the SAS/STAT User's Guide, volume 2.
Copyright © 1997 by Jerry Alan Veeh. All rights reserved.