Proc anova could also be used to do the analysis of variance when the design is balanced. The only advantage in using proc anova is that it uses less computational resources. Its use is recommended only for large projects in which experts have insured that the design is balanced.
The file pig.dat contains data on the birth weight of poland china pigs in 8 litters (from Scheffe). The layout of the file is the litter number followed by the birth weights for the litter. Note that there are unequal litter sizes. The underlying model is that each litter has some mean birth weight. The hypothesis to be tested is that these means are the same for all of the litters.
A data set called
pig is created which contains 2 variables
weight. The variable
litter contains the litter
number; each corresponding value of
weight contains the birth
weight of a piglet. The model is that
weight depends on
question of interest is whether this dependence really exists. Note
litter is a qualitative variable here which serves only
to distinguish between (potentially) different populations.
A basic analysis using proc glm proceeds as follows.
proc glm data=pig; class litter; model weight = litter; run;
class statement specifies that
litter is a
QUALITATIVE variable. All qualitative independent variables in
the model should be listed in the class statement. Independent
variables which appear in the model but not in the class
statement are treated as QUANTITATIVE variables in the model.
This allows proc glm to perform regression and analyze analysis
of covariance type models.
model statement specifies the dependent variable (
independent variable (
The output of this procedure is a standard analysis of variance table. Two types of sums of squares are produced. In virtually all cases, it is the TYPE III sums of squares and their associated tests that are of interest.
A more in depth analysis is obtained as follows.
proc glm data=pig; class litter; model weight=litter /solution e; means litter /bon tukey scheffe alpha=0.10 ; run;
solution option of the model statement prints a solution to the
normal equations, i.e., the estimate of the parameter vector in the
general linear model.
e option of the model statement produces a print out which specifies
the general form of the estimable functions for the model. This is useful
for more advanced analysis using contrast and estimate
means statement produces pairwise comparisons of the means
by the method(s) specified in the following options statements. Note
that comparisons are only done for the main effects.
bon option of the means statement produces Bonferroni type
groupings of the means.
tukey option produces groupings of the means using Tukey's
method of multiple comparison.
scheffe option produces groupings of the means using Scheffe's
method of multiple comparison.
Actual confidence intervals (rather than just groupings by
similar means) produced by these methods can be obtained by
clm option in the means statement.
alpha=0.10 option specifies that the multiple comparison methods
are to be done at the 90% confidence level. If not specified, the
value defaults to alpha=0.05 (95% level).
Further options can be found in the online help under SAS SYSTEM HELP--MODELING & ANALYSIS TOOLS--DATA ANALYSIS--(ANALYSIS OF VARIANCE) GLM or in the SAS/STAT User's Guide volume 2.