Showing posts with label correlated data models. Show all posts
Showing posts with label correlated data models. Show all posts
Monday, May 17, 2010
Example 7.37: calculation of Hotelling's T^2
[フレーム]
Hotelling's T^2 is a multivariate statistic used to compare two groups, where multiple outcomes are observed for each subject. Here we demonstrate how to calculate Hotelling's T^2 using R and SAS, and test the code using a simulation study then apply it to data from the HELP study.
R
We utilize an approach suggested by Peter Mandeville in a posting to the R-help mailing list.
hotelling = function(y1, y2) {
# check the appropriate dimension
k = ncol(y1)
k2 = ncol(y2)
if (k2!=k)
stop("input matrices must have the same number of columns: y1 has ",
k, " while y2 has ", k2)
# calculate sample size and observed means
n1 = nrow(y1)
n2 = nrow(y2)
ybar1= apply(y1, 2, mean); ybar2= apply(y2, 2, mean)
diffbar = ybar1-ybar2
# calculate the variance of the difference in means
v = ((n1-1)*var(y1)+ (n2-1)*var(y2)) /(n1+n2-2)
# calculate the test statistic and associated quantities
t2 = n1*n2*diffbar%*%solve(v)%*%diffbar/(n1+n2)
f = (n1+n2-k-1)*t2/((n1+n2-2)*k)
pvalue = 1-pf(f, k, n1+n2-k-1)
# return the list of results
return(list(pvalue=pvalue, f=f, t2=t2, diff=diffbar))
}
We begin by ensuring that the two input matrices have the same number of columns, then calculate quantities needed for the calculation. The function returns a list with a number of named objects.
To test the function, we ran a simulation study to ensure that under the null hypothesis the function rejected approximately 5% of the time.
# do a simple simulation study to ensure Type I error rate is preserved
library(MASS)
numoutcomes = 5
mu1 = rep(0, numoutcomes); mu2 = rep(0, numoutcomes)
rho = .4
# compound symmetry covariance matrix
Sigma = rho*matrix(rep(1, numoutcomes^2), numoutcomes) +
diag(rep(1-rho, numoutcomes))
numsim = 10000
reject = numeric(numsim)
for (i in 1:numsim) {
grp1 = mvrnorm(100, mu1, Sigma)
grp2 = mvrnorm(150, mu2, Sigma)
reject[i] = hotelling(grp1, grp2)$pvalue<0.05
}
This yields appropriate Type I error rate (just under 5%).
> table(reject)
reject
0 1
9537 463
We can also undertake a comparison of non-homeless and homeless subjects in the HELP dataset, comparing individual observations on the PCS (physical component score) and MCS (mental component scores) of the SF-36 and the CESD measure of depressive symptoms.
ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = with(ds, cbind(pcs, mcs, cesd))
grp1 = subset(smallds, ds$homeless==0)
grp2 = subset(smallds, ds$homeless==1)
res = hotelling(grp1, grp2)
This saves the results in the object res, which we can display.
> names(res)
[1] "pvalue" "f" "t2" "diff"
> res
$pvalue
[,1]
[1,] 0.1082217
$f
[,1]
[1,] 2.035024
$t2
[,1]
[1,] 6.132267
$diff
pcs mcs cesd
2.064049 1.755975 -2.183760
While there are differences in the mean PCS, MCS, and CESD scores between
the homeless and non-homeless subjects, we do not have sufficient evidence to reject the null hypothesis that they have equivalent means back in the populations (p=0.11).
SAS
In SAS, we can easily make the two-group multivariate comparison using the MANOVA statement in proc glm. This hides the actual value of the statistic, but generates the desired p-value. As usual, we use the ODS system to reduce SAS output.
ods select multstat;
proc glm data="c:\book\help";
class homeless;
model pcs mcs cesd = homeless;
manova h=homeless;
run;
quit;
SAS prints four valid tests, which result in equal values in this case.
MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of No Overall HOMELESS Effect
H = Type III SSCP Matrix for HOMELESS
E = Error SSCP Matrix
S=1 M=0.5 N=223.5
Statistic Value F Value Num DF Den DF Pr> F
Wilks' Lambda 0.98658536 2.04 3 449 0.1082
Pillai's Trace 0.01341464 2.04 3 449 0.1082
Hotelling-Lawley Trace 0.01359704 2.04 3 449 0.1082
Roy's Greatest Root 0.01359704 2.04 3 449 0.1082
Here we also demonstrate using SAS proc iml (section 1.9) to calculate the statistic and perform the test from scratch. This roughly parallels the R development above, though we omit the test for equal number of variables.
*get the help data;
data help;
set "c:\book\help";
run;
proc iml;
* build a function;
start hotelling;
n1 = nrow(h1);
n0 = nrow(h0);
onesum = j(n1,1,1); *make an n1*1 matrix of 1s;
zerosum = j(n0,1,1);
meanh1 = h1`*onesum/n1;
meanh0 = h0`*zerosum/n0;
i1 = i(n1); * make identity matrix;
i0 = i(n0);
vc1 = h1`*(i1-onesum*onesum`/n1)*h1/(n1-1.0);
* make variance-covariance matrix;
vc0 = h0`*(i0-zerosum*zerosum`/n0)*h0/(n0-1.0);
pooledvc = ((n1-1.0)*vc1+(n0-1.0)*vc0)/(n1+n0-2.0);
meandiff = meanh1 - meanh0;
t2 = meandiff`* inv(pooledvc) *meandiff * (n1 * n0)/(n1 + n0);
nvars = ncol(h1);
f=((n1+n0-nvars-1)/(nvars*(n1+n0-2)))*t2;
df2=n1+n0-nvars-1;
p=1-probf(f,nvars,df2);
fcrit = finv(.95, nvars, df2);
print meandiff pooledvc t2 f nvars df2 p fcrit;
finish;
*use the function with the HELP data;
use help;
read all var {cesd pcs mcs} where (homeless = 1) into h1;
read all var {cesd pcs mcs} where (homeless = 0) into h0;
run hotelling;
quit;
Here's the output:
meandiff pooledvc t2 f nvars
2.1837595 155.76862 -38.46634 -108.8555 6.1322669 2.0350243 3
-2.064049 -38.46634 115.50213 14.423897
-1.755975 -108.8555 14.423897 164.44444
df2 p fcrit
449 0.1082217 2.6247689
Saturday, July 18, 2009
Book excerpts now posted
[フレーム]
We've posted excerpts from the book on the book website. The excerpts include Chapter 3 (regression and ANOVA) in its entirety. This demonstrates how the entries (the generic descriptions of software functions) and the worked examples reinforce each other. We've also posted selected entries from each of the other chapters, describing probability distributions and random variables, two sample comparisons (T-tests, Wilcoxon rank-sum test, Kolmogorov-Smirnov test, median test), linear mixed models for correlated data (including general correlation, random intercepts and slopes, and more complex structures), graphics (scatterplots, histograms, saving plots as PDF), and power calculations (analytic and Monte Carlo).
Subscribe to:
Comments (Atom)