Proc GLM for MANOVA

Contents Index


A multivariate analysis of variance (MANOVA) can be carried out using Proc GLM. The method of doing such an analysis is described here. The special case of repeated measures is also considered.

The basic MANOVA model can be written in the form Y=XB+E where Y is an nxd matrix with the d dimensional observation vectors as rows, X is an nxk matrix of known constants, and B is a kxm matrix of unknown parameters.

As an example, look again at the Fisher Iris data of Project 2 and found in iris.dat. Let's consider only the sepal length and width measurements for each species. A simple model would be that the mean vector of these measurements depends only on species. Suppose SAS data set sepal contains the variables sepallen, sepalwid, and species. To test the hypothesis of equal mean vectors for the 3 species use:

proc glm data=iris;
	class species;
	model sepallen sepalwid=species;
	manova h=_all_;
run;

The model statement specifies the model for the 2 independent variables in this case as depending only on species.

The manova h=_all_ statement carries out a manova analysis using the model and tests all hypotheses in the model of the form LB=0. This is the useful form for the first part of an analysis.

Variables not listed in the class statement are treated as quantitative variables. Multivariate linear regression models can be analyzed within proc glm. The independent variables of the regression appear in the model as above, but should not be listed in a class statement.

Hypotheses of the form LBM=0 can be tested by specifying the transpose of the matrix M in the manova statement. In the above analysis this could be used to test that the mean sepal length is the same as the mean sepal width, as follows:

proc glm data=sepal;
class species;
model sepallen sepalwid=species;
manova h=_all_
	m=(1 -1) prefix=diff;
run;

Note that the m= statement specifies the TRANSPOSE of the matrix M in the hypothesis LBM=0.

The prefix= option specifies a (family) of names to be used for the new variables created by the M matrix transformation.

Contrast statements can be used to obtain customized hypothesis tests here, as discussed in proc glm (advanced). Contrast statements must appear before the manova statement.

The means statement can also be used.

A repeated measures design can also be analyzed. Suppose cholesterol readings are taken on each subject at 3 different times. In a purely multivariate model the 3 dimensional observation vectors for each subject could be assumed to form a sample from a multivariate normal distribution with unknown covariance matrix. The analysis would then proceed as above. If the covariance matrix is assumed to satisfy a sphericity condition a repeated measures analysis can be performed. Assume that the data set chol has data records with variables c1-c3 which contain the cholesterol readings for each patient. The program

proc glm data=chol;
model c1-c3=;
repeated time;
run;

will analyzed the data under the repeated measures structure and label the repeated measure as time in the output.

For further information see the SAS/STAT User's Guide, volume 2.


Contents Index

Copyright © 1997 by Jerry Alan Veeh. All rights reserved.