The proc reg procedure is used to perform regression analysis.

Proc GLM can also be used to do this analysis by leaving
the quantitative variables out of the `class`

statement. In some
ways, proc glm is superior to proc reg because proc glm allows
manipulations in the model statement (such as x*x to obtain
quadratic factors) which are not allowed in proc reg. However,
proc reg allows certain automatic model selection features and
a crude plotting feature not available in proc glm.

The variables analyzed using proc reg must be numeric variables
all of which appear in a SAS data set. If `x`

,
`y`

, and `z`

are 3 numeric variables the basic invocation is

```
proc reg data=stuff;
model z= x y;
run;
```

There are many options available in the model statement. As in proc glm, the options are listed after a backslash on the model statement line. One example is

```
proc reg data=stuff;
model z= x y /
noint
selection=stepwise
sle=.05
sls=.05;
run;
```

The `noint`

option specifies that the fitted model is to have NO
intercept (constant) term.

The `selection=`

option specifies how variables are to be introduced
into the model. The default (if selection= is not used) is equivalent to
`selection=none`

, in which all the variables in the model statement
are used. Setting `selection=stepwise`

introduces a variable into the
model provided it is significant at the `sle`

level and deletes a
variable from the model if it is NOT significant at the `sls`

level.
Setting `selection=rsquare`

selects the model which has the maximum
value of the square of R.

A final important option in proc reg is the `output`

statement.
This statement, which must follow the `model`

statement, creates a
SAS data set containing the variables in the original data set
together with new variables as specified in the output
statement. An illustration of some of the common options is

```
output
out=results
predicted=pred
residual=resid
L95M=lowmean
U95M=highmean
L95=lowpred
U95=highpred;
```

The `out=`

option gives the name of the new SAS dataset.

The `predicited=`

option gives the name of the variable in the
`out=`

data set which contains the predicted value of the
dependent variable. By adding records to the original data set
which specify values of the INDEPENDENT
variables in the model but set the corresponding value of the
DEPENDENT variable to missing, one can obtain
predictions given by the model for unobserved settings of the
independent variables.

The `residual=`

option gives the name of the variable in the
`out=`

data set which contains the value of the residual.

The `L95M=`

and `U95M=`

options give the names of the variables in
the `out=`

data set which contain the lower and upper endpoints of
a 95% confidence interval for the mean.

The `L95=`

and `U95=`

options give the names of the variables in
the `out=`

data set which contain the lower and upper endpoints of
a 95% confidence interval for the predicted value.

For further information see the SAS/STAT User's Guide, volume 2.

Copyright © 1997 by Jerry Alan Veeh. All rights reserved.