Showing posts with label mtext(). Show all posts
Showing posts with label mtext(). Show all posts

Tuesday, October 25, 2011

Example 9.11: Employment plot


A facebook friend posted the picture reproduced above-- it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush's presidency ought to be included. Let's make a fair plot of job growth and loss. Data can be retrieved from the Bureau of Labor Statistics, where Nick will be spending his next sabbatical. The extract we use below is also available from the book website. This particular table reports the cumulative change over the past three months, adjusting for seasonal trends. This tends to smooth out the line.

SAS

The first job is to get the data into SAS. Here we demonstrate reading it directly from a URL, as outlined in section 1.1.6.

filename myurl
url "http://www.math.smith.edu/sasr/datasets/bls.csv";

data q_change;
infile myurl delimiter=',';
input Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual;
run;

The raw data are in a pretty inconvenient format for plotting. To make a long, narrow data set with a row for each month, we'll use proc transpose (section 1.5.3) to flip each year on its side. Then, to attach a date to each measure, we'll use the compress function. First we add "01" (the first of the month) to the month name, which is in a variable created by proc transpose with the default name "_name_". Then we tack on the year variable, and input the string in the date format. The resulting variable is a SAS date (number of days since 12/31/1959, see section 1.6.1).

proc transpose data=q_change out=q2;
by year;
run;

data q3;
set q2;
date1 = input(compress("01"||_name_||year),date11.);
run;

Now the data are ready to plot. It would probably be possible to use proc sgplot but proc gplot is more flexible and allows better control for presentation graphics.

title "3-month change in private-sector jobs, seasonally adjusted";
axis1 minor = none label = (h=2 angle = 90 "Thousands of jobs")
value = (h = 2);
axis2 minor = none value = (h=2)label = none
offset = (1cm, -5cm)
reflabel = (h=1.5 "Truman" "Eisenhower" "Kennedy/Johnson"
"Nixon/Ford" "Carter" "Reagan" "GHW Bush" "Clinton" "GW Bush" "Obama" );
symbol1 i=j v=none w=3;

proc gplot data=q3;
plot col1 * date1 / vaxis=axis1 haxis=axis2 vref=0
href = '12apr1945'd '21jan1953'd '20jan1961'd '20jan1969'd
'20jan1977'd '21jan1981'd '20jan1989'd '20jan1993'd
'20jan2001'd '20jan2009'd;
format date1 monyy6.;
run;
quit;

Much of the syntax above has been demonstrated in our book examples and blog entries. What may be unfamiliar is the use of the href option in the plot statement and the reflabel option in the axis statement. The former draws reference lines at the listed values in the plot, while the latter adds titles to these lines. The resulting plot is shown here.

Looking fairly across postwar presidencies, only the Kennedy/Johnson and Clinton years were mostly unmarred by periods with large losses in jobs. The Carter years were also times jobs were consistently added. While the graphic shared on facebook overstates the case against GW Bush, it fairly shows Obama as a job creator thus far, to the extent a president can be credited with jobs created on his watch.


R

The main trick in R is loading the data and getting it into the correct format.
here we use cbind() to grab the appropriate columns, then transpose that matrix and turn it into a vector which serves as input for making a time series object with the ts() command (as in section 4.2.8). Once this is created, the default plot for a time series object is close to what we have in mind.

ds = read.csv("http://www.math.smith.edu/sasr/datasets/bls.csv",
header=FALSE)
jobs = with(ds, cbind(V2, V3, V4, V5, V6, V7, V8, V9, V10,
V11, V12, V13))
jobsts = ts(as.vector(t(jobs)), start=c(1945, 1),
frequency=12)
plot(jobsts, plot.type="single", col=4,
ylab="number of jobs (in thousands)")

All that remains is to add the reference lines for 0 jobs and the presidencies. The lines are most easily added with the abline() function (section 5.2.1). Easier than adding labels for the lines within the plot function will be to use the mtext() function to place the labels in the margins. We'll write a little function to save a few keystrokes by plotting the line and adding the label together.

abline(h=0)
presline = function(date,line,name){
mtext(at=date,text= name, line=line)
abline(v = date)
}
presline(1946,1,"Truman")
presline(1953,2,"Eisenhower")
presline(1961,1,"Kennedy/Johnson")
presline(1969,2,"Nixon/Ford")
presline(1977,1,"Carter")
presline(1981,2,"Reagan")
presline(1989,1,"GHW Bush")
presline(1993,2,"Clinton")
presline(2001,1,"GW Bush")
presline(2009,2,"Obama")

It might be worthwhile to standardize the number of jobs to the population size, since the dramatic loss of jobs due to demobilization after the Second World War during a single month in 1945 (2.4 million) represented 1.7% of the population, while the recent loss of 2.3 million jobs in 2009 represented only 0.8% of the population.


Monday, June 20, 2011

Example 8.41: Scatterplot with marginal histograms


The scatterplot is one of the most ubiquitous, and useful graphics. It's also very basic. One of its shortcomings is that it can hide important aspects of the marginal distributions of the two variables. To address this weakness, you can add a histogram of each margin to the plot. We demonstrate using the SF-36 MCS and PCS subscales in the HELP data set.

SAS
SAS provides code to perform this using proc template and proc sgrender. These procedures are not intended for casual or typical SAS users. Its syntax is, to our eyes, awkward. This is roughly analogous to R functions that simply call C routines. Nonetheless, it's possible to adapt code that works. The code linked above was edited to set the transparency to 0 and to change the plotted symbol size to 5 from 11px. These options appear in the scatterplot statement about midway through the code.

Once the edited code is submitted, the following lines produce the plot shown above.

proc sgrender data="C:\book\help.sas7bdat" template=scatterhist;
dynamic YVAR="mcs" XVAR="pcs"
TITLE="MCS-PCS Relationship";
run;


R
For R, we adapted some code found in an old R-help post to generate the following function. The mtext() function puts text in the margins and is used here to label the axes. The at option in that function centers the label within the scatterplot data using some algebra.

scatterhist = function(x, y, xlab="", ylab=""){
zones=matrix(c(2,0,1,3), ncol=2, byrow=TRUE)
layout(zones, widths=c(4/5,1/5), heights=c(1/5,4/5))
xhist = hist(x, plot=FALSE)
yhist = hist(y, plot=FALSE)
top = max(c(xhist$counts, yhist$counts))
par(mar=c(3,3,1,1))
plot(x,y)
par(mar=c(0,3,1,1))
barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0)
par(mar=c(3,0,1,1))
barplot(yhist$counts, axes=FALSE, xlim=c(0, top), space=0, horiz=TRUE)
par(oma=c(3,3,0,0))
mtext(xlab, side=1, line=1, outer=TRUE, adj=0,
at=.8 * (mean(x) - min(x))/(max(x)-min(x)))
mtext(ylab, side=2, line=1, outer=TRUE, adj=0,
at=(.8 * (mean(y) - min(y))/(max(y) - min(y))))
}

The results of the following code are shown below.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
with(ds, scatterhist(mcs, pcs, xlab="MCS", ylab="PCS"))

Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /