Proc Discrim

Contents Index


Proc discrim is used to conduct discriminant analysis.

The purpose of discriminant analysis is to classify an experimental unit as being from one of two (or more) populations on the basis of observations obtained on that unit. For example, a bank might attempt to classify a new loan applicant as a `good payer' or `dead beat' on the basis of the answers given on a loan application.

The common syntax is:

proc discrim 
	data=bank
	method=normal
	pool=yes
	slpool=0.001
	posterr
	out=results;
class type;
var hist balance income;
run;

The data= line specifies the data set used in the analysis.

The method=normal option selects discrimination based on the assumption that the data is from multivaiate normal populations.

The pool option can be set to either yes, no, or test. This controls whether the covariance matrices are assumed equal in the analysis (yes), assumed to be UNequal (no), or tested for equality (test) with subsequent analysis performed according to the outcome of this test.

The slpool= statement sets the significance level of the test for equality of covariance matrices when pool=test is used. It is otherwise disregarded.

The posterr option prints out estimated error probabilities for the computed discrimination rule.

The out= option creates a new data set which contains the variables in the original one together with a new variable called _into_ of the same type as the class variable. The _into_ variable gives the class into which the observation is assigned by the discrimination rule.

The class statement, which MUST BE USED, specifies the variable for which classification is to occur.

The var statement specifies the variables to be used for discrimination. If omitted, all variables in the data set (except the one specified in the class statement) are used. It is best to specify the variables explicitly to avoid the unintended use of extraneous variables in the analysis.

Sometimes one set of data is used to construct the discrimination rule and a second set is used to test the rule. This can be accomplished as follows.

proc discrim
	data=construc
	testdata=tryout
	method=normal
	pool=yes
	slpool=0.001
	posterr;
class type;
var hist balance income;
run;

The testdata= option specifies a SAS data set which is used to tryout the discrimination rule found by analyzing the data= data set. It is assumed that the testdata= data set contains the same independent variables and class variable as the data= data set. Diagnostics of the discrimination rule on the testdata= data set are printed.

To use one data set to construct a discrimination rule using method=normal for later use, use the outstat option:

proc discrim
	data=first
	outstat=calib
	method=normal 
	.....

The special data set calib then contains the discrimination rule constructed from the data set first. To apply this rule to the data set second:

proc discrim
	data=calib
	testdata=second
	method=normal
	.....

For further information see the SAS/STAT User's Guide, volume 1.


Contents Index

Copyright © 1997 by Jerry Alan Veeh. All rights reserved.