Proc discrim is used to conduct discriminant analysis.
The purpose of discriminant analysis is to classify an experimental unit as being from one of two (or more) populations on the basis of observations obtained on that unit. For example, a bank might attempt to classify a new loan applicant as a `good payer' or `dead beat' on the basis of the answers given on a loan application.
The common syntax is:
proc discrim
data=bank
method=normal
pool=yes
slpool=0.001
posterr
out=results;
class type;
var hist balance income;
run;
The data= line specifies the data set used in the analysis.
The method=normal option selects discrimination based on the
assumption that the data is from multivaiate normal
populations.
The pool option can be set to either yes, no, or
test. This controls whether the covariance matrices are assumed
equal in the analysis (yes), assumed to be UNequal (no), or
tested for equality (test) with subsequent analysis performed
according to the outcome of this test.
The slpool= statement sets the significance level of the test for
equality of covariance matrices when pool=test is used. It is
otherwise disregarded.
The posterr option prints out estimated error probabilities for
the computed discrimination rule.
The out= option creates a new data set which contains the
variables in the original one together with a new variable
called _into_ of the same type as the class variable. The
_into_ variable gives the class into which the observation is
assigned by the discrimination rule.
The class statement, which MUST BE USED, specifies the variable
for which classification is to occur.
The var statement specifies the variables to be used for
discrimination. If omitted, all variables in the data set
(except the one specified in the class statement) are used. It
is best to specify the variables explicitly to avoid the
unintended use of extraneous variables in the analysis.
Sometimes one set of data is used to construct the discrimination rule and a second set is used to test the rule. This can be accomplished as follows.
proc discrim
data=construc
testdata=tryout
method=normal
pool=yes
slpool=0.001
posterr;
class type;
var hist balance income;
run;
The testdata= option specifies a SAS data set which is used to
tryout the discrimination rule found by analyzing the data= data
set. It is assumed that the testdata= data set contains the same
independent variables and class variable as the data= data set.
Diagnostics of the discrimination rule on the testdata= data set
are printed.
To use one data set to construct a discrimination rule using
method=normal for later use, use the outstat option:
proc discrim
data=first
outstat=calib
method=normal
.....
The special data set calib then contains the discrimination rule
constructed from the data set first. To apply this rule to the
data set second:
proc discrim
data=calib
testdata=second
method=normal
.....
For further information see the SAS/STAT User's Guide, volume 1.
Copyright © 1997 by Jerry Alan Veeh. All rights reserved.