SAS and R: mtext()

Catalogs of posts

Showing posts with label mtext(). Show all posts

Tuesday, October 25, 2011

Example 9.11: Employment plot

A facebook friend posted the picture reproduced above-- it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush's presidency ought to be included. Let's make a fair plot of job growth and loss. Data can be retrieved from the Bureau of Labor Statistics, where Nick will be spending his next sabbatical. The extract we use below is also available from the book website. This particular table reports the cumulative change over the past three months, adjusting for seasonal trends. This tends to smooth out the line.

SAS

The first job is to get the data into SAS. Here we demonstrate reading it directly from a URL, as outlined in section 1.1.6.


filename myurl
 url "http://www.math.smith.edu/sasr/datasets/bls.csv";
 
data q_change;
 infile myurl delimiter=',';
input Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual;
run;

The raw data are in a pretty inconvenient format for plotting. To make a long, narrow data set with a row for each month, we'll use proc transpose (section 1.5.3) to flip each year on its side. Then, to attach a date to each measure, we'll use the compress function. First we add "01" (the first of the month) to the month name, which is in a variable created by proc transpose with the default name "_name_". Then we tack on the year variable, and input the string in the date format. The resulting variable is a SAS date (number of days since 12/31/1959, see section 1.6.1).


proc transpose data=q_change out=q2;
 by year;
run;

data q3;
set q2;
 date1 = input(compress("01"||_name_||year),date11.);
run;

Now the data are ready to plot. It would probably be possible to use proc sgplot but proc gplot is more flexible and allows better control for presentation graphics.


title "3-month change in private-sector jobs, seasonally adjusted";
axis1 minor = none label = (h=2 angle = 90 "Thousands of jobs")
 value = (h = 2);
axis2 minor = none value = (h=2)label = none 
 offset = (1cm, -5cm)
 reflabel = (h=1.5 "Truman" "Eisenhower" "Kennedy/Johnson" 
 "Nixon/Ford" "Carter" "Reagan" "GHW Bush" "Clinton" "GW Bush" "Obama" );
symbol1 i=j v=none w=3;

proc gplot data=q3;
plot col1 * date1 / vaxis=axis1 haxis=axis2 vref=0 
 href = '12apr1945'd '21jan1953'd '20jan1961'd '20jan1969'd 
 '20jan1977'd '21jan1981'd '20jan1989'd '20jan1993'd 
 '20jan2001'd '20jan2009'd;
format date1 monyy6.;
run;
quit;

Much of the syntax above has been demonstrated in our book examples and blog entries. What may be unfamiliar is the use of the href option in the plot statement and the reflabel option in the axis statement. The former draws reference lines at the listed values in the plot, while the latter adds titles to these lines. The resulting plot is shown here.

Looking fairly across postwar presidencies, only the Kennedy/Johnson and Clinton years were mostly unmarred by periods with large losses in jobs. The Carter years were also times jobs were consistently added. While the graphic shared on facebook overstates the case against GW Bush, it fairly shows Obama as a job creator thus far, to the extent a president can be credited with jobs created on his watch.

R

The main trick in R is loading the data and getting it into the correct format.
here we use cbind() to grab the appropriate columns, then transpose that matrix and turn it into a vector which serves as input for making a time series object with the ts() command (as in section 4.2.8). Once this is created, the default plot for a time series object is close to what we have in mind.


ds = read.csv("http://www.math.smith.edu/sasr/datasets/bls.csv", 
 header=FALSE)
jobs = with(ds, cbind(V2, V3, V4, V5, V6, V7, V8, V9, V10,
 V11, V12, V13))
jobsts = ts(as.vector(t(jobs)), start=c(1945, 1), 
 frequency=12)
plot(jobsts, plot.type="single", col=4,
 ylab="number of jobs (in thousands)")

All that remains is to add the reference lines for 0 jobs and the presidencies. The lines are most easily added with the abline() function (section 5.2.1). Easier than adding labels for the lines within the plot function will be to use the mtext() function to place the labels in the margins. We'll write a little function to save a few keystrokes by plotting the line and adding the label together.


abline(h=0)
presline = function(date,line,name){
 mtext(at=date,text= name, line=line)
 abline(v = date)
}
presline(1946,1,"Truman")
presline(1953,2,"Eisenhower")
presline(1961,1,"Kennedy/Johnson")
presline(1969,2,"Nixon/Ford")
presline(1977,1,"Carter")
presline(1981,2,"Reagan")
presline(1989,1,"GHW Bush")
presline(1993,2,"Clinton")
presline(2001,1,"GW Bush")
presline(2009,2,"Obama")

It might be worthwhile to standardize the number of jobs to the population size, since the dramatic loss of jobs due to demobilization after the Second World War during a single month in 1945 (2.4 million) represented 1.7% of the population, while the recent loss of 2.3 million jobs in 2009 represented only 0.8% of the population.

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: abline(), axis statement, Bureau of Labor Statistics, col option, date formats, href option, job creation, mtext(), offset option, plot.ts(), proc gplot, reflabel option, time series, ts() 4 comments

Monday, June 20, 2011

Example 8.41: Scatterplot with marginal histograms

[フレーム]

The scatterplot is one of the most ubiquitous, and useful graphics. It's also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the HELP data set.

SAS
SAS provides code to perform this using proc template and proc sgrender. These procedures are not intended for casual or typical SAS users. Its syntax is, to our eyes, awkward. This is roughly analogous to R functions that simply call C routines. Nonetheless, it's possible to adapt code that works. The code linked above was edited to set the transparency to 0 and to change the plotted symbol size to 5 from 11px. These options appear in the scatterplot statement about midway through the code.

Once the edited code is submitted, the following lines produce the plot shown above.


proc sgrender data="C:\book\help.sas7bdat" template=scatterhist; 
 dynamic YVAR="mcs" XVAR="pcs" 
 TITLE="MCS-PCS Relationship";
run;

R
For R, we adapted some code found in an old R-help post to generate the following function. The mtext() function puts text in the margins and is used here to label the axes. The at option in that function centers the label within the scatterplot data using some algebra.


scatterhist = function(x, y, xlab="", ylab=""){
 zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
 layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
 xhist = hist(x, plot=FALSE)
 yhist = hist(y, plot=FALSE)
 top = max(c(xhist$counts, yhist$counts))
 par(mar=c(3,3,1,1))
 plot(x,y)
 par(mar=c(0,3,1,1))
 barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
 par(mar=c(3,0,1,1))
 barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
 par(oma=c(3,3,0,0))
 mtext(xlab, side=1, line=1, outer=TRUE, adj=0, 
 at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
 mtext(ylab, side=2, line=1, outer=TRUE, adj=0, 
 at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
}

The results of the following code are shown below.


ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
with(ds, scatterhist(mcs, pcs, xlab="MCS", ylab="PCS"))

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: barplot(), histogram, layout(), mtext(), par(), proc sgrender, proc template, scatterplot, with() 9 comments

Subscribe to SAS and R!

RSS: Or: Get SAS and R by Email

Search the SAS and R Blog

The book (second edition, 2014)

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

Buy a book

SAS and R: Data Management, Statistical Analysis, and Graphics, Second Edition

Using SAS for Data Management, Statistical Analysis, and Graphics

Using R for Data Management, Statistical Analysis, and Graphics

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

Sidebar list of all entries.

SAS and R

Catalogs of posts

Tuesday, October 25, 2011

Example 9.11: Employment plot

Monday, June 20, 2011

Example 8.41: Scatterplot with marginal histograms

About SAS and R

Topics discussed