SAS and R: lattice library

Catalogs of posts

Showing posts with label lattice library. Show all posts

Monday, July 16, 2012

Example 9.38: dynamite plots, revisited

Dynamite plots are a somewhat pejorative term for a graphical display where the height of a bar indicates the mean, and the vertical line on top of it represents the standard deviation (or standard error). These displays are commonly found in many scientific disciplines, as a way of communicating group differences in means.

Many find these displays troubling. One post entitled them unmitigated evil.
The Vanderbilt University Department of Biostatistics has a formal policy discouraing use of these plots, stating that:

Dynamite plots often hide important information. This is particularly true of small or skewed data sets. Researchers are highly discouraged from using them, and department members have the option to decline participation in papers in which the lead author requires the use of these plots.

Despite the limitations of the display, we believe that there may be times when the display is helpful as a way to compare groups, assuming distributions that are approximately normal. Samuel Brown also described creation of these figures, as a way to encourage computing in R. We previously demonstrated how to create them in SAS and R, and today discuss code created by Randall Pruim to demonstrate how such graphics can be created using lattice graphics within the mosaic package.

R


library(mosaic)
dynamitePlot <- function(height, error, names = as.character(1:length(height)), 
 significance = NA, ylim = c(0, maxLim), ...) {
 if (missing(error)) { error = 0 }
 maxLim <- 1.2* max(mapply(sum, height, error))
 mError <- min(c(error, na.rm=TRUE))
 barchart(height ~ names, ylim=ylim, panel=function(x,y,...) {
 panel.barchart(x,y,...)
 grid.polyline(c(x,x), c(y, y+error), id=rep(x,2), default.units='native',
 arrow=arrow(angle=45, length=unit(mError, 'native'))) 
 grid.polyline(c(x,x), c(y, y-error), id=rep(x,2), default.units='native',
 arrow=arrow(angle=45, length=unit(mError, 'native'))) 
 grid.text(x=x, y=y + error + .05*maxLim, label=significance, 
 default.units='native')
 }, ...)
}

Much of the code involves setting up the appropriate axis limits, then drawing the lines and adding the text. We can call this new function with an artificial example with 4 groups:


Values <- c(1,2,5,4)
Errors <- c(0.25, 0.5, 0.33, 0.12)
Names <- paste("Trial", 1:4)
Sig <- c("a", "a", "b", "b")
dynamitePlot(Values, Errors, names=Names, significance=Sig)

We still don't recommend frequent use of these plots (as other displays may be better (e.g. dotplots for very small sample sizes or violin plots), but having the capability to generate dynamite plots within the lattice framework can be handy.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Posted by Nick Horton

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: barchart() function, confidence intervals, graphics, grid.polyline() function, grid.text() function, lattice library, mapply() function, mosaic package 4 comments

Thursday, October 13, 2011

Example 9.9: Simplifying R using the mosaic package (part 1)

[フレーム]

While both SAS and R are powerful systems for statistical analysis, they can be frustrating to new users or those learning statistics for the first time.

R
The mosaic package is designed to help simplify the interface for such new users, while allowing them to undertake sophisticated analyses.

As an example of how the package simplifies life for the novice user, consider calculating summary statistics and displaying a densityplot for the CESD (measure of depressive symptom) scores by substance abuse group in the HELP dataset. Doing this in R without the package would require mastering a package such as plyr to replicate results by substance or a typing-intensive use of syntax to select rows corresponding to each substance.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(mosaic)
options(digits=3)

After loading the data and the package, and setting the number of digits to a more reasonable default, we can call the mean() function to easily calculate this statistic (denoted by S in the result) for each of the three substance abuse groups alcohol, cocaine or heroin.

> mean(cesd ~ substance, data=ds)
 substance S N Missing
1 alcohol 34.4 177 0
2 cocaine 29.4 152 0
3 heroin 34.9 124 0

Similar results are seen when we calculate the standard deviations per group:

> sd(cesd ~ substance, data=ds)
 substance S N Missing
1 alcohol 12.1 177 0
2 cocaine 13.4 152 0
3 heroin 11.2 124 0

Another function can calculate a raft of summary statistics for each group that are nicely formatted.

UPDATE FROM 12/14/2013: favstats(cesd ~ substance, data=ds) is an even better way to do this...

> summary(cesd ~ substance, data=ds, fun=favstats)
cesd N=453
+---------+-------+---+----+---+-------+---+----+-----+----+---+--------+
| | |N |min |Q1 |median |Q3 |max |mean |sd |n |missing |
+---------+-------+---+----+---+-------+---+----+-----+----+---+--------+
|substance|alcohol|177|4 |26 |36 |42 |58 |34.4 |12.1|177|0 |
| |cocaine|152|1 |19 |30 |39 |60 |29.4 |13.4|152|0 |
| |heroin |124|4 |28 |35 |43 |56 |34.9 |11.2|124|0 |
+---------+-------+---+----+---+-------+---+----+-----+----+---+--------+
|Overall | |453|1 |25 |34 |41 |60 |32.8 |12.5|453|0 |
+---------+-------+---+----+---+-------+---+----+-----+----+---+--------+

These commands allow quick review of the data to ensure, for example, that assumptions of equal variance are justified, or that coding errors or missing values haven't crept in.

A graphical depiction using a set of densityplots (shown above) can be created using the command:

densityplot(~ cesd, group=substance, data=ds, auto.key=TRUE)

SAS
We're unaware of any similar program that attempts to simplify SAS syntax for educational use. To replicate the above results, we would use the means and sgpanel procedures.

data ds;
set "C:\book\help.sas7bdat";
run;
options ls=80;
proc means data=ds fw=4
 min q1 median q3 max mean std nmiss n;
 class substance;
 var cesd;
run;
 Analysis Variable : CESD
 N Lower Upper Std
 SUBSTANCE Obs Min Quartile Median Quartile Max Mean Dev
 ------------------------------------------------------------------
 alcohol 177 4.00 26.0 36.0 42.0 58.0 34.4 12.1
 cocaine 152 1.00 19.0 30.0 39.0 60.0 29.4 13.4
 heroin 124 4.00 28.0 35.0 43.0 56.0 34.9 11.2
 ------------------------------------------------------------------
 N N
 SUBSTANCE Obs Miss N
 ---------------------------
 alcohol 177 0 177
 cocaine 152 0 152
 heroin 124 0 124
 ---------------------------

After reading the data in, the meansprocedure can produce any of the desired statistics (plus may others) directly. To replicate the mosaic package in printing a single statistic, list only that statistic in the proc means statement. Note that the overall statistic in the R table is not included. To replicate that row, you would re-run the above code, omitting the class statement.

To the best of our knowledge, there still does not exist an easy way to plot multiple densities in a single SAS plot. In example 2.6.4 we show how it can be done using proc kde, saving the density estimates, and plotting separately. (Code for this is included at the book web site.) But in the interest of simple code, we show a simpler method using proc sgpanel. The result, show below, is less useful than the R plot from the the mosaic package, but still gets the point across.

proc sgpanel data=ds;
 panelby substance / columns=1;
 density cesd / type=kernel;
run;

Posted by Nick Horton

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: densityplot(), favstats(), lattice library, mean(), mosaic package, sd(), statistical education, summary() 5 comments

Monday, June 13, 2011

Example 8.40: Side-by-side histograms

[フレーム]

It's often useful to compare histograms for some key variable, stratified by levels of some other variable. There are several ways to display something like this. The simplest may be to plot the two histograms in separate panels.

SAS
In SAS, the most direct and generalizable approach is through the sgpanel procedure.


proc sgpanel data = 'c:\book\help.sas7bdat';
 panelby female;
 histogram cesd;
run;

The results are shown above.

R
In R, the lattice package provides a similarly direct approach.


ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds$gender = ifelse(ds$female==1, "female", "male")
library(lattice)
histogram(~ cesd | gender, data=ds)

The results are shown below.

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: histogram, ifelse(), lattice library, panelby statement, proc sgpanel, side by side histograms 3 comments

Saturday, August 8, 2009

Example 7.9: Get data from SAS into R

[フレーム]

Some people use both SAS and R in their daily work. They might be more familiar with SAS as a tool for manipulating data and R preferable for plotting purposes. While our goal in the book is to enable people to avoid having to switch back and forth, the following example shows how to move data from SAS into R. Our use of Stata format as an interchange mechanism is perhaps unorthodox, but eminently workable. Other file formats (see section 1.2.2, creating files for use by other packages) can also be specified.

Suppose we wanted to plot CESD over time for each individual. While we show how to do this sort of thing in SAS (see section 5.6.2), it’s hard to do without SAS version 9.2. Instead, we recall it’s easy using the lattice library in R (see section 5.2.2). But we need a ”long” data set with a row for each time point. This is the sort of data management one might prefer to do in SAS.

SAS

First, we read the data from a data set stored in SAS format. Then we use proc transpose (section 1.5.3) to get the data into the required shape. Finally, we save the ”long” data set in Stata format (1.2.2).


libname k "c:\book";

proc transpose data=k.help out=ds1;
by id;
var cesd1 cesd2 cesd3 cesd4;
run;

proc print data = ds1 (obs = 5); run;

 Obs ID _NAME_ _LABEL_ COL1

 1 1 CESD1 1 cesd 7 
 2 1 CESD2 2 cesd . 
 3 1 CESD3 3 cesd 8 
 4 1 CESD4 4 cesd 5 
 5 2 CESD1 1 cesd 11 


proc export data=ds1 (rename=(_label_=timec1))
 outfile = "c:\book\helpfromsas.dta" dbms=dta;
run;

R
In R, we first read the file from Stata format (1.1.5), then attach() it (section 1.3.1) for ease of typing. Then we check the first few lines of data using the head() function (section 1.13.1). Noting that the time variable we brought from SAS is a character string, we convert it to a numeric variable using as.numeric() (section 1.4.2) and substr() (section 1.4.3). After loading the lattice library, we display the series for each subject. In this, we use the syntax for subsetting observations (1.5.1) to keep only the first 20 observations and the as.factor() function (1.4.10) to improve the labels in the output.


> library(foreign)
> xsas <- read.dta("C:\\book\\helpfromsas.dta")
> head(xsas)

 id _name_ timec1 col1
1 1 CESD1 1 cesd 7
2 1 CESD2 2 cesd NA
3 1 CESD3 3 cesd 8
4 1 CESD4 4 cesd 5
5 2 CESD1 1 cesd 11
6 2 CESD2 2 cesd NA

> attach(xsas)
> time <- as.numeric(substr(timec1, 1, 1))
> library(lattice)
> xyplot(col1[id < 21] ~ time[id < 21]|as.factor(id[id < 21]))

In the next entry, we'll reverse this process to manipulate data in R and produce the plot in SAS.

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: as.factor, convert SAS to R, lattice library, read data in Stata format, save data in Stata format, substr, transpose 5 comments

Subscribe to SAS and R!

RSS: Or: Get SAS and R by Email

Search the SAS and R Blog

The book (second edition, 2014)

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

Buy a book

SAS and R: Data Management, Statistical Analysis, and Graphics, Second Edition

Using SAS for Data Management, Statistical Analysis, and Graphics

Using R for Data Management, Statistical Analysis, and Graphics

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

Sidebar list of all entries.

SAS and R

Catalogs of posts

Monday, July 16, 2012

Example 9.38: dynamite plots, revisited

Thursday, October 13, 2011

Example 9.9: Simplifying R using the mosaic package (part 1)

Monday, June 13, 2011

Example 8.40: Side-by-side histograms

Saturday, August 8, 2009

Example 7.9: Get data from SAS into R

About SAS and R

Topics discussed