Showing posts with label matrices. Show all posts
Showing posts with label matrices. Show all posts

Monday, May 17, 2010

Example 7.37: calculation of Hotelling's T^2

Hotelling's T^2 is a multivariate statistic used to compare two groups, where multiple outcomes are observed for each subject.

Here we demonstrate how to calculate Hotelling's T^2 using R and SAS, and test the code using a simulation study then apply it to data from the HELP study.


R

We utilize an approach suggested by Peter Mandeville in a posting to the R-help mailing list.



hotelling = function(y1, y2) {
# check the appropriate dimension
k = ncol(y1)
k2 = ncol(y2)
if (k2!=k)
stop("input matrices must have the same number of columns: y1 has ",
k, " while y2 has ", k2)

# calculate sample size and observed means
n1 = nrow(y1)
n2 = nrow(y2)
ybar1= apply(y1, 2, mean); ybar2= apply(y2, 2, mean)
diffbar = ybar1-ybar2

# calculate the variance of the difference in means
v = ((n1-1)*var(y1)+ (n2-1)*var(y2)) /(n1+n2-2)

# calculate the test statistic and associated quantities
t2 = n1*n2*diffbar%*%solve(v)%*%diffbar/(n1+n2)
f = (n1+n2-k-1)*t2/((n1+n2-2)*k)
pvalue = 1-pf(f, k, n1+n2-k-1)

# return the list of results
return(list(pvalue=pvalue, f=f, t2=t2, diff=diffbar))
}


We begin by ensuring that the two input matrices have the same number of columns, then calculate quantities needed for the calculation. The function returns a list with a number of named objects.

To test the function, we ran a simulation study to ensure that under the null hypothesis the function rejected approximately 5% of the time.


# do a simple simulation study to ensure Type I error rate is preserved
library(MASS)
numoutcomes = 5
mu1 = rep(0, numoutcomes); mu2 = rep(0, numoutcomes)
rho = .4
# compound symmetry covariance matrix
Sigma = rho*matrix(rep(1, numoutcomes^2), numoutcomes) +
diag(rep(1-rho, numoutcomes))
numsim = 10000
reject = numeric(numsim)
for (i in 1:numsim) {
grp1 = mvrnorm(100, mu1, Sigma)
grp2 = mvrnorm(150, mu2, Sigma)
reject[i] = hotelling(grp1, grp2)$pvalue<0.05
}


This yields appropriate Type I error rate (just under 5%).


> table(reject)
reject
0 1
9537 463


We can also undertake a comparison of non-homeless and homeless subjects in the HELP dataset, comparing individual observations on the PCS (physical component score) and MCS (mental component scores) of the SF-36 and the CESD measure of depressive symptoms.


ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = with(ds, cbind(pcs, mcs, cesd))
grp1 = subset(smallds, ds$homeless==0)
grp2 = subset(smallds, ds$homeless==1)
res = hotelling(grp1, grp2)


This saves the results in the object res, which we can display.


> names(res)
[1] "pvalue" "f" "t2" "diff"
> res
$pvalue
[,1]
[1,] 0.1082217

$f
[,1]
[1,] 2.035024

$t2
[,1]
[1,] 6.132267

$diff
pcs mcs cesd
2.064049 1.755975 -2.183760


While there are differences in the mean PCS, MCS, and CESD scores between
the homeless and non-homeless subjects, we do not have sufficient evidence to reject the null hypothesis that they have equivalent means back in the populations (p=0.11).

SAS

In SAS, we can easily make the two-group multivariate comparison using the MANOVA statement in proc glm. This hides the actual value of the statistic, but generates the desired p-value. As usual, we use the ODS system to reduce SAS output.


ods select multstat;
proc glm data="c:\book\help";
class homeless;
model pcs mcs cesd = homeless;
manova h=homeless;
run;
quit;


SAS prints four valid tests, which result in equal values in this case.


MANOVA Test Criteria and Exact F Statistics for
the Hypothesis of No Overall HOMELESS Effect
H = Type III SSCP Matrix for HOMELESS
E = Error SSCP Matrix

S=1 M=0.5 N=223.5

Statistic Value F Value Num DF Den DF Pr> F

Wilks' Lambda 0.98658536 2.04 3 449 0.1082
Pillai's Trace 0.01341464 2.04 3 449 0.1082
Hotelling-Lawley Trace 0.01359704 2.04 3 449 0.1082
Roy's Greatest Root 0.01359704 2.04 3 449 0.1082


Here we also demonstrate using SAS proc iml (section 1.9) to calculate the statistic and perform the test from scratch. This roughly parallels the R development above, though we omit the test for equal number of variables.


*get the help data;
data help;
set "c:\book\help";
run;

proc iml;
* build a function;
start hotelling;
n1 = nrow(h1);
n0 = nrow(h0);
onesum = j(n1,1,1); *make an n1*1 matrix of 1s;
zerosum = j(n0,1,1);
meanh1 = h1`*onesum/n1;
meanh0 = h0`*zerosum/n0;
i1 = i(n1); * make identity matrix;
i0 = i(n0);
vc1 = h1`*(i1-onesum*onesum`/n1)*h1/(n1-1.0);
* make variance-covariance matrix;
vc0 = h0`*(i0-zerosum*zerosum`/n0)*h0/(n0-1.0);
pooledvc = ((n1-1.0)*vc1+(n0-1.0)*vc0)/(n1+n0-2.0);
meandiff = meanh1 - meanh0;
t2 = meandiff`* inv(pooledvc) *meandiff * (n1 * n0)/(n1 + n0);
nvars = ncol(h1);
f=((n1+n0-nvars-1)/(nvars*(n1+n0-2)))*t2;
df2=n1+n0-nvars-1;
p=1-probf(f,nvars,df2);
fcrit = finv(.95, nvars, df2);
print meandiff pooledvc t2 f nvars df2 p fcrit;
finish;

*use the function with the HELP data;
use help;
read all var {cesd pcs mcs} where (homeless = 1) into h1;
read all var {cesd pcs mcs} where (homeless = 0) into h0;
run hotelling;
quit;


Here's the output:


meandiff pooledvc t2 f nvars

2.1837595 155.76862 -38.46634 -108.8555 6.1322669 2.0350243 3
-2.064049 -38.46634 115.50213 14.423897
-1.755975 -108.8555 14.423897 164.44444


df2 p fcrit

449 0.1082217 2.6247689

Saturday, July 25, 2009

Example 7.7: Tabulate binomial probabilities

Suppose we wanted to assess the probability P(X=x) for a binomial random variate with n = 10 and with p = .81, .84, ..., .99. This could be helpful, for example, in various game settings.

In SAS, we find the probability that X=x using differences in the CDF calculated via the cdf function (1.10.1). We loop through the various binomial probabilities and values of x using the do ... end structure (section 1.11.1). Finally, we store the probabilities in implicitly named variables via an array statement (section 1.11.5).

SAS

data table (drop=j);
array probs [11] prob0 prob1 - prob10;
do p = .81 to .99 by .03;
do j = 0 to 10;
if j eq 1 then probs[j+1] = cdf("BINOMIAL", 0, p, 10);
else probs[j+1] = cdf("BINOMIAL", j, p, 10)
- cdf("BINOMIAL", j-1, p, 10);
end;
output;
end;
run;


The results are printed with two decimal places using the format statement (section 1.2.4). The options statement (section A.4) uses the ls option to narrow the output width to be compatible with Blogger.


options ls=64;
proc print data=table noobs;
var p prob0 prob1 - prob10;
format _numeric_ 3.2;
run;


And the results are:

p
p p p p p p p p p p r
r r r r r r r r r r o
o o o o o o o o o o b
b b b b b b b b b b 1
p 0 1 2 3 4 5 6 7 8 9 0
.81 .00 .00 .00 .00 .00 .02 .08 .19 .30 .29 .12
.84 .00 .00 .00 .00 .00 .01 .05 .15 .29 .33 .17
.87 .00 .00 .00 .00 .00 .00 .03 .10 .25 .37 .25
.90 .00 .00 .00 .00 .00 .00 .01 .06 .19 .39 .35
.93 .00 .00 .00 .00 .00 .00 .00 .02 .12 .36 .48
.96 .00 .00 .00 .00 .00 .00 .00 .01 .05 .28 .66
.99 .00 .00 .00 .00 .00 .00 .00 .00 .00 .09 .90


R
In R we start by making a vector of the binomial probabilities, using the : operator (section 1.11.3) to generate a sequence of integers. After creating a matrix (section B.4.4) to hold the table results, we loop (section 1.11.1) through the binomial probabilities, calling the dbinom() function (section 1.1) to find the probability that X=x. This calculation is nested within the round() function (section 1.2.5) to reduce the digits displayed. Finally, we glue the vector of binomial probabilities to the results.


p <- .78 + (3 * 1:7)/100
allprobs <- matrix(nrow=length(p), ncol=11)
for (i in 1:length(p)) {
allprobs[i,] <- round(dbinom(0:10, 10, p[i]),2)
}
table <- cbind(p, allprobs)
table


With the result:

p
[1,] 0.81 0 0 0 0 0 0.02 0.08 0.19 0.30 0.29 0.12
[2,] 0.84 0 0 0 0 0 0.01 0.05 0.15 0.29 0.33 0.17
[3,] 0.87 0 0 0 0 0 0.00 0.03 0.10 0.25 0.37 0.25
[4,] 0.90 0 0 0 0 0 0.00 0.01 0.06 0.19 0.39 0.35
[5,] 0.93 0 0 0 0 0 0.00 0.00 0.02 0.12 0.36 0.48
[6,] 0.96 0 0 0 0 0 0.00 0.00 0.01 0.05 0.28 0.66
[7,] 0.99 0 0 0 0 0 0.00 0.00 0.00 0.00 0.09 0.90


As with the example of jittered scatterplots for dichotomous y by continuous x, (Example 7.3, Example 7.4, and Example 7.5) we will revisit this example for improvement in later entries.
Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /