Proc discrim is used to conduct discriminant analysis.
The purpose of discriminant analysis is to classify an experimental unit as being from one of two (or more) populations on the basis of observations obtained on that unit. For example, a bank might attempt to classify a new loan applicant as a `good payer' or `dead beat' on the basis of the answers given on a loan application.
The common syntax is:
proc discrim data=bank method=normal pool=yes slpool=0.001 posterr out=results; class type; var hist balance income; run;
data= line specifies the data set used in the analysis.
method=normal option selects discrimination based on the
assumption that the data is from multivaiate normal
pool option can be set to either
test. This controls whether the covariance matrices are assumed
equal in the analysis (
yes), assumed to be UNequal (
tested for equality (
test) with subsequent analysis performed
according to the outcome of this test.
slpool= statement sets the significance level of the test for
equality of covariance matrices when
pool=test is used. It is
posterr option prints out estimated error probabilities for
the computed discrimination rule.
out= option creates a new data set which contains the
variables in the original one together with a new variable
_into_ of the same type as the
class variable. The
_into_ variable gives the class into which the observation is
assigned by the discrimination rule.
class statement, which MUST BE USED, specifies the variable
for which classification is to occur.
var statement specifies the variables to be used for
discrimination. If omitted, all variables in the data set
(except the one specified in the
class statement) are used. It
is best to specify the variables explicitly to avoid the
unintended use of extraneous variables in the analysis.
Sometimes one set of data is used to construct the discrimination rule and a second set is used to test the rule. This can be accomplished as follows.
proc discrim data=construc testdata=tryout method=normal pool=yes slpool=0.001 posterr; class type; var hist balance income; run;
testdata= option specifies a SAS data set which is used to
tryout the discrimination rule found by analyzing the
set. It is assumed that the
testdata= data set contains the same
independent variables and class variable as the
data= data set.
Diagnostics of the discrimination rule on the
testdata= data set
To use one data set to construct a discrimination rule using
method=normal for later use, use the
proc discrim data=first outstat=calib method=normal .....
The special data set
calib then contains the discrimination rule
constructed from the data set
first. To apply this rule to the
proc discrim data=calib testdata=second method=normal .....
For further information see the SAS/STAT User's Guide, volume 1.