SAS and R: SAS formats

Catalogs of posts

Showing posts with label SAS formats. Show all posts

Monday, April 14, 2014

Example 2014.4: Hilbert Matrix

Rick Wicklin showed how to make a Hilbert matrix in SAS/IML. Rick has a nice discussion of these matrices and why they might be interesting; the value of H_{r,c} is 1/(r+c-1). We show how to make this matrix in the data step and in R. We also show that Rick's method for displaying fractions in SAS/IML works in PROC PRINT, and how they can be displayed in R.

SAS
In the SAS data step, we'll use an array to make the columns of the matrix and do loops to put values into each cell in a row and output the row into the data set before incrementing the row value. This is a little awkward, but does at least preserve the simple 1/(r+c-1) function of the cell values. We arrange the approach using a global macro to be more general.

%let n = 5;
data h;
array val [&n] v1 - v&n;
 do r = 1 to &n;
 do c = 1 to &n;
 val[c] = 1/(r+c-1);
 end;
 output;
 end;
run;

To print the resulting matrix, we use the fract format, as Rick demonstrated. Pretty nice! The noobs option in the proc print statement suppresses the row number that would otherwise be shown.

proc print data = h noobs; 
var v1 - v5;
format _all_ fract.; 
run;

 v1 v2 v3 v4 v5
 1 1/2 1/3 1/4 1/5
1/2 1/3 1/4 1/5 1/6
1/3 1/4 1/5 1/6 1/7
1/4 1/5 1/6 1/7 1/8
1/5 1/6 1/7 1/8 1/9

R
As is so often the case in R, a solution can be generated in one line using nesting. Also as usual, though, its a bit unnatural, and we don't deconstruct it here.

n = 5
1/sapply(1:n,function(x) x:(x+n-1))

A more straightforward approach follows. It's certainly less efficient, though efficiency would seem a non-issue in any application of this code. It's also the kind of code that R aficionados would scoff at. It does have the attractiveness of perfect clarity, however. We begin by defining an empty matrix, then simply loop through the cells of the matrix, assigning values one by one.

n=5
h1 = matrix(nrow=n,ncol=n)
for (r in 1:n) {
 for (c in 1:n)
 h1[r,c] = 1/(r+c-1)
}

To display the fractions, we use the fractions() function in MASS package that's distributed with R.

> library(MASS)
> fractions(h1)
 [,1] [,2] [,3] [,4] [,5]
[1,] 1 1/2 1/3 1/4 1/5 
[2,] 1/2 1/3 1/4 1/5 1/6 
[3,] 1/3 1/4 1/5 1/6 1/7 
[4,] 1/4 1/5 1/6 1/7 1/8 
[5,] 1/5 1/6 1/7 1/8 1/9

An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Posted by Ken Kleinman

Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: array statement, do loop, fractions, MASS package, SAS formats 4 comments

Monday, June 18, 2012

Example 9.35: Discrete randomization and formatted output

[フレーム]

A colleague asked for help with randomly choosing a kid within a family. This is for a trial in which families are recruited at well-child visits, but in each family only one of the children having a well-child visit that day can be in the study. The idea is that after recruiting the family, the research assistant needs to choose one child, but if they make that choice themselves, the children are unlikely to be representative. Instead, we'll allow them to make a random decision through an easily used slip that can be put into sealed envelopes. The envisioned process is that the RA will recruit the family, determine the number of eligible children, then open the envelope to find out which child was randomly selected.

One thought here would be to generate separate stacks of envelopes for each given family size, and have the research assistant open an envelope from the appropriate stack. However, this could be logistically challenging, especially since the RAs will spend weeks away from the home office. Instead, we'll include all plausible family sizes on each slip of paper. It seems unlikely that more than 5 children in a family will have well-child visits on the same day.

SAS
We'll use the SAS example to demonstrate using SAS macros to write SAS code, as well as showing a plausible use for SAS formats (section 1.4.12) and making use of proc print.


/* the following macro will write out equal probabilities for selecting 
each integer between 1 and the argument, in the format needed for the 
rand function. E.g., if the argument is 3, 
it will write out
1/3,1/3,1/3
*/

%macro tbls(n);
%do i = 1 %to &n;
1/&n %if &i < &n %then ,
%end;
%mend tbls;

/* then we can use the %tbls macro to create the randomization
via rand("TABLE") (section 1.10.4). */ 
data kids;
do family = 1 to 10000;
 nkids = 2; chosen = rand("TABLE",%tbls(2)); output;
 nkids = 3; chosen = rand("TABLE",%tbls(3)); output;
 nkids = 4; chosen = rand("TABLE",%tbls(4)); output;
 nkids = 5; chosen = rand("TABLE",%tbls(5)); output;
end;
run;

/* check randomization */
proc freq data = kids;
table nkids * chosen / nocol nopercent;
run; 

 nkids chosen

 Frequency|
 Row Pct | 1| 2| 3| 4| 5| Total
 ---------+--------+--------+--------+--------+--------+
 2 | 50256 | 49744 | 0 | 0 | 0 | 100000
 | 50.26 | 49.74 | 0.00 | 0.00 | 0.00 |
 ---------+--------+--------+--------+--------+--------+
 3 | 33429 | 33292 | 33279 | 0 | 0 | 100000
 | 33.43 | 33.29 | 33.28 | 0.00 | 0.00 |
 ---------+--------+--------+--------+--------+--------+
 4 | 25039 | 24839 | 25245 | 24877 | 0 | 100000
 | 25.04 | 24.84 | 25.25 | 24.88 | 0.00 |
 ---------+--------+--------+--------+--------+--------+
 5 | 19930 | 20074 | 20188 | 20036 | 19772 | 100000
 | 19.93 | 20.07 | 20.19 | 20.04 | 19.77 |
 ---------+--------+--------+--------+--------+--------+
 Total 128654 127949 78712 44913 19772 400000

Looks pretty good. Now we need to make the output usable to the research assistants, by formatting the results into English. We'll use the same format for each number of kids. This saves some keystrokes now, but may possibly cause the RAs some confusion-- it means that we might refer to the "4th oldest" of 4 children, rather than the "youngest". We could fix this using a different format for each number of children, analogous to the R version below.


proc format;
value chosen
1 = "oldest"
2 = '2nd oldest'
3 = '3rd oldest'
4 = '4th oldest'
5 = '5th oldest';
run;
 
/* now, make a text variable the concatenates (section 1.4.5) the variables 
and some explanatory text */
data k2;
set kids;
if nkids eq 2 then
 t1 = "If there are " || strip(nkids) ||" children then choose the " ||
 strip(put(chosen,chosen.)) || " child.";
else
 t1 = " " || strip(nkids) ||" ________________________ " ||
 strip(put(chosen,chosen.));
run;

/* then we print. Notice the options to print in plain text, shorten the 
page length and width, and remove the date and page number from the SAS output, as
well as in the proc print statement to remove the observation number and
show the line number, with a few other tricks */
options nonumber nodate ps = 60 ls = 68;
OPTIONS FORMCHAR="|----|+|---+=|-/\*";
proc print data = k2 (obs = 3) noobs label sumlabel;
by family;
var t1;
label t1 = '00'x family = "Envelope";
run;

---------------------------- Envelope=1 ----------------------------



 If there are 2 children then choose the 2nd oldest child.
 3 ________________________ 3rd oldest
 4 ________________________ 4th oldest
 5 ________________________ 5th oldest


---------------------------- Envelope=2 ----------------------------



 If there are 2 children then choose the 2nd oldest child.
 3 ________________________ oldest
 4 ________________________ oldest
 5 ________________________ 3rd oldest


---------------------------- Envelope=3 ----------------------------



 If there are 2 children then choose the 2nd oldest child.
 3 ________________________ 2nd oldest
 4 ________________________ 3rd oldest
 5 ________________________ 2nd oldest

R
For R, we leave some trial code in place, to demonstrate how one might discover, test, and build R code in this setting. Most results have been omitted.

sample(5, size = 1)
# choose a (discrete uniform) random integer between 1 and 5

apply(matrix(2:5),1,sample,size=1)
# choose a random integer between 1 and 2, then between 1 and 3, etc.,
# using apply() to repeat the call to sample() with different maximum number
# apply() needs a matrix or array input
# result of this is the raw data needed for one family

replicate(3,apply(matrix(2:5),1,sample,size=1))
# replicate() is in the apply() family and just repeats the
# function n times

[,1] [,2] [,3]
[1,] 2 1 2
[2,] 2 1 2
[3,] 2 2 2
[4,] 3 5 4

Now we have the raw data for the envelopes. Before formatting it for printing, let's check it to make sure it works correctly.

test=replicate(100000, apply(matrix(2:5), 1, sample, size=1))
apply(test, 1, summary)
[,1] [,2] [,3] [,4]
Min. 1.0 1 1.000 1.000
1st Qu. 1.0 1 1.000 2.000
Median 1.0 2 2.000 3.000
Mean 1.5 2 2.492 3.003
3rd Qu. 2.0 3 3.000 4.000
Max. 2.0 3 4.000 5.000
# this is not so helpful-- need the count or percent for each number
# this would be the default if the data were factors, but they aren't
# check to see if we can trick summary() into treating these integers
# as if they were factors
methods(summary)
# yes, there's a summary() method for factors-- let's apply it
# there's also apply(test,1,table) which might be better, if you remember it
apply(test, 1, summary.factor)
[[1]]
1 2
50025 49975

[[2]]
1 2 3
33329 33366 33305

[[3]]
1 2 3 4
25231 25134 24849 24786

[[4]]
1 2 3 4 5
19836 20068 20065 20022 20009
# apply(test,1,table) will give similar results, if you remember it

Well, that's not too pretty, but it's clear that the randomization is working. Now it's time to work on formatting the output.

mylist=replicate(5, apply(matrix(2:5), 1, sample, size=1))
# brief example data set

# We'll need to use some formatted values (section 1.14.12), as in SAS.
# Here, we'll make new value labels for each number of children,
# which will make the output easier to read. We add in an envelope
# number and wrap it all into a data frame.
df = data.frame(envelope = 1:5,
twokids=factor(mylist[1,],1:2,labels=c("youngest","oldest")),
threekids=factor(mylist[2,],1:3,labels=c("youngest", "middle", "oldest")),
fourkids=factor(mylist[3,],1:4,labels=c("youngest", "second youngest",
"second oldest", "oldest")),
fivekids=factor(mylist[4,],1:5,labels=c("youngest", "second youngest",
"middle", "second oldest", "oldest"))
)

# now we need a function to take a row of the data frame and make a single slip
# the paste() function (section 1.4.5) puts together the fixed and variable
# content of each row, while the cat() function will print it without quotes
slip = function(kidvec) {
cat(paste("------------- Envelope", kidvec[1], "------------------"))
cat(paste("\nIf there are", 2:5, " children, select the", kidvec[2:5],"child"))
cat("\n \n \n")
}

# test it on one row
slip(df[1,])

# looks good-- now we can apply() it to each row of the data frame
apply(df, 1, slip)

------------- Envelope 1 ------------------
If there are 2 children, select the youngest child
If there are 3 children, select the youngest child
If there are 4 children, select the second youngest child
If there are 5 children, select the youngest child

------------- Envelope 2 ------------------
If there are 2 children, select the youngest child
If there are 3 children, select the youngest child
If there are 4 children, select the second oldest child
If there are 5 children, select the middle child

------------- Envelope 3 ------------------
If there are 2 children, select the youngest child
If there are 3 children, select the youngest child
If there are 4 children, select the youngest child
If there are 5 children, select the second youngest child

# and so forth

# finally, we can save the result in a file with
# capture.output()
capture.output(apply(df,1,slip), file="testslip.txt")

An unrelated note about aggregators:
We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Posted by Ken Kleinman
Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: apply(), capture.output(), matrix(), methods(), proc freq, rand function, replicate(), sample(), SAS formats, SAS macro 0 comments

Tuesday, November 15, 2011

Example 9.14: confidence intervals for logistic regression models

[フレーム]
Recently a student asked about the difference between confint() and confint.default() functions, both available in the MASS library to calculate confidence intervals from logistic regression models. The following example demonstrates that they yield different results.

R

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(MASS)
glmmod = glm(homeless ~ age + female, binomial, data=ds)

> summary(glmmod)
Call:
glm(formula = homeless ~ age + female, family = binomial, data = ds)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.3600 -1.1231 -0.9185 1.2020 1.5466

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.89262 0.45366 -1.968 0.0491 *
age 0.02386 0.01242 1.921 0.0548 .
female -0.49198 0.22822 -2.156 0.0311 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 625.28 on 452 degrees of freedom
Residual deviance: 617.19 on 450 degrees of freedom
AIC: 623.19

Number of Fisher Scoring iterations: 4

> exp(confint(glmmod))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.1669932 0.9920023
age 0.9996431 1.0496390
female 0.3885283 0.9522567
> library(MASS)
> exp(confint.default(glmmod))
2.5 % 97.5 %
(Intercept) 0.1683396 0.9965331
age 0.9995114 1.0493877
female 0.3909104 0.9563045

Why are they different? Which one is correct?

SAS

Fortunately the detailed documentation in SAS can help resolve this. The logistic procedure (section 4.1.1) offers the clodds option to the model statement. Setting this option to both produces two sets of CL, based on the Wald test and on the profile-likelihood approach. (Venzon, D. J. and Moolgavkar, S. H. (1988), “A Method for Computing Profile-Likelihood Based Confidence Intervals,” Applied Statistics, 37, 87–94.)

ods output cloddswald = waldcl cloddspl = plcl;
proc logistic data = "c:\book\help.sas7bdat" plots=none;
class female (param=ref ref='0');
model homeless(event='1') = age female / clodds = both;
run;

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals

Effect Unit Estimate 95% Confidence Limits

AGE 1.0000 1.024 1.000 1.050
FEMALE 1 vs 0 1.0000 0.611 0.389 0.952

Odds Ratio Estimates and Wald Confidence Intervals

Effect Unit Estimate 95% Confidence Limits

AGE 1.0000 1.024 1.000 1.049
FEMALE 1 vs 0 1.0000 0.611 0.391 0.956

Unfortunately, the default precision of the printout isn't quite sufficient to identify whether this distinction aligns with the differences seen in the two R methods. We get around this by using the ODS system to save the output as data sets (section A.7.1). Then we can print the data sets, removing the default rounding formats to find all of the available precision.

title "Wald CL";
proc print data=waldcl; format _all_; run;
title "PL CL";
proc print data=plcl; format _all_; run;

Wald CL
Odds
Obs Effect Unit RatioEst LowerCL UpperCL

1 AGE 1 1.02415 0.99951 1.04939
2 FEMALE 1 vs 0 1 0.61143 0.39092 0.95633

PL CL
Odds
Obs Effect Unit RatioEst LowerCL UpperCL

1 AGE 1 1.02415 0.99964 1.04964
2 FEMALE 1 vs 0 1 0.61143 0.38853 0.95226

With this added precision, we can see that the confint.default() function in the MASS library generates the Wald confidence limits, while the confint() function produces the profile-likelihood limits. This also explains the confint() comment "Waiting for profiling to be done..." Thus neither CI from the MASS library is incorrect, though the profile-likelihood method is thought to be superior, especially for small sample sizes. Little practical difference is seen here.

Posted by Nick Horton
Email This BlogThis! Share to X Share to Facebook Share to Pinterest

Labels: clodds statement, confidence intervals, confint(), logistic regression, MASS library, ods system, proc logistic, profile likelihood, SAS formats 2 comments

Subscribe to SAS and R!

RSS: Or: Get SAS and R by Email

Search the SAS and R Blog

The book (second edition, 2014)

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

Buy a book

SAS and R: Data Management, Statistical Analysis, and Graphics, Second Edition

Using SAS for Data Management, Statistical Analysis, and Graphics

Using R for Data Management, Statistical Analysis, and Graphics

Recent Comments

Loading...

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

Sidebar list of all entries.

Blog Archive

About SAS and R

This blog is where we post additional examples for our books about SAS and R (Amazon: SAS and R.)

Please also visit the web site for the book, where code from the book and additional information are available.

Another resource is the StatSoftEquivs wiki. You can add to it!

Blogger only allows one author per post, but we collaborate actively on most entries and share credit and responsibility.

Creative Commons License
SAS and R blog by Ken Kleinman and Nicholas Horton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Topics discussed

%sysevalf 3D plots a*b=c syntax abline() adaptIntegrate() adding text to graphics adjacent observations age distribution aggregated datasets Alan Zaslavsky Amazon Sales rank Amazon web services amherst annnotate data sets annotate data set annotate macro anova() API apply family of functions apply() arrange() array statement arrays as.data.frame() as.factor as.factor() as.numeric() as.POSIXlt() assessing differences association measures association plot assumptions attach() axis axis control axis statement back-to-back barplots barchart() function bargraph.CI() function barplot() barplots(back to back) baseball bayes statement Bayesian methods Benjamini and Hochberg Benjamini-Hochberg binning binom.test() binomial probability Bland-Altman plot blog aggregators Bonferroni procedure boxplot brute force bubble plot Bureau of Labor Statistics by statement C() calculus call symput capture.output() cat() categorical covariates categorical data causal inference ceil ceiling() censored data central limit theorem central moments cex change variable types chi-square test chisq.test() choropleth Chris Wild circles citation() citing R citing SAS class probabilities class statement clodds statement Clopper-Pearson CI cloud computing clustering coda package coef() function col option college majors colnames() colors communicating between SAS and R comparing models comparisons complex survey design conditional execution conditioning confidence intervals confint() confounding connect points Contour contrast statement contrasts convert categorical class to numeric Convert R to SAS convert SAS to R correlated data models count models coverage probabilities Cox proportional hazards model CPAN CPAP Cramer's V CRAN crowd-sourcing cubature library cumulative distribution function cumulative hazard custom graphics layout customizing plots cut function cyclemeter data frames data science data step date and time values date formats ddf option debugging densityplot() deparse function deparse(substitute() descriptive statistics design matrix Design package detach() determinant Digital Ocean digits of Pi dim() diploma problem Dirk Eddelbuettel distance do loop do() Docker dotplot dotplot/boxplot dplyr drop statement dummy variables duplicated data Durbin-Watson statistic dynamite plot Edward Tufte elrm package empirical CDF empirical problem solving end = epidemiology estimate statement eval statement events/trials syntax exact CI exact logistic regression exact statement Excel excerpt exchangeability exp() expand.table() expected cell counts expected value exponential factor analysis factor() failure time analysis false discovery rate favstats() Fibonacci series file print file.info() file() finite mixture models Firth logistic regression Firth option fitted() flexmix package FLXMRglmfix function FLXPmultinom function fonts for() foreign library formatted output formatting fractions frailty models function() functions Galton Gamma function gather() gdata package generalized pairs plots GGally package ggformula package ggformula: another option for teaching graphics in R to beginnersdata science ggplot2 package git Github glm.nb() function glm() function gls() gmodels package goodness of fit google spreadsheet gps grammar of graphics graphics grep() function grid.polyline() function grid.text() function group option gsub() Hadley Wickham hat-check problem hazard function heat map HELP data set hexagon hexbin package hexbin() hilo interpolation HistData package histogram Hochberg procedure homogeneity homoscedasticity Hosmer and Lemeshow Hotelling's T href option ifelse() imputation indicator variables informal inference input statement integrated development environment integration interaction interactive development environments JAGS James-Stein estimator job creation John Emerson John Snow Kaplan-Meier estimates Ken Beath kurtosis kurtosis() ladd() lag function lapply() lapply() function large datasets latent class analysis latent class model latent class models lattice library lawstat package layout.show() layout() le Cessie and Houwelingen legend legend statement legend() levene.test() function Levene's test libraries in R linear regression lines() list of variables log scale logic logic tests logistf package logistic regression looping Louis Aslett lurking variables make categories manifest variable MANOVA mapply() function mapproj package maps maps package Markdown markerattrs Markov Chain Monte Carlo MASS library MASS package match() match() function matlines() matplot() matrices matrix matrix operations matrix() Matt Regan Matt Shotwell Maxine Pfannkuch MCMC MCMCpack package mdy function mean() measures of association median regression merge merge statement methods() Metropolis-Hastings algorithm mice package mice() Michael Friendly Michael Weylandt Minard minimum missing data missing data modeling mixtools package mod mod function mode=include modeling language moments package Monte Carlo experiments Monty Hall problem mosaic package mosaic plot Mplus MplusAutomation package mtext() multinomial observations multiple comparisons multiple imputation multiple regression multivariate normal multivariate statistics mvrnorm() na.string name conflict names(). events/trials syntax negative binomial distribution negative binomial regression Nelson-Aalen estimator new variables New Year's resolutions nlme nobs option non-monotonic missingness normality assumption NP complete null hypothesis numeric() observed cell counts odds ratio ods graphics on ods output statement ods system offset axes offset option one-to-many one-way chi-square test open source OpenBUGS options() order order() outer() function output statement overlay option p.adjust() p.adjust() function pairs plots pairwaise comparisons panelby statement par() parameterization partykit package paste() pathological distribution pattern statement pch pdf function perl permutation test Philips Pi pie() function plot plot colors plot symbols plot.ts() plot() plotFun() plotrix package plotting functions Plus 4 estimator pnbinom() point option pointlabel option Poisson distribution Poisson regression poLCA package polygon() pool() population age power calculations power via Monte Carlo ppois() predictive mean matching principal components probability probability distributiholons probability distributions proc fcmp proc fmm proc freq proc gchart proc genmod proc glm proc gmap proc gplot proc gproject proc greplay proc import proc kde proc lca proc logistic proc mcmc proc mi proc mianalyze proc mixed proc multtest proc phreg proc rank proc sgpanel proc sgplot proc sgrender proc simnormal proc standard proc surveyselect proc tabulate proc template proc transpose proc univariate proc_r productivity profile likelihood programming style Project MOSAIC projects prop.test() propensity scores proportional hazards assumption pseudo-random numbers psychology put pyramid plots qt() quadratic equation quantile function quoting R R environments R function R Inferno R packages R-bloggers R-sig-teaching R2winbugs radio static ragged input rand function random number generation random numbers random statement random variables randomization randomLCA package range of variables rare disease assumption rbind() RColorBrewer RColorBrewer package RCurl package read complex data files read data by byte Read data in R Read data in SAS read data in Stata format read Excel sheets read from local disk read from URL read sheets read.csv() read.sas7bdat() read.table() read.xlsx() readBin() readline() readLines() recursive partitioning reference value referencing sequential variables reflabel option regexp() regression adjustment regression to the mean regression trees regular expressions rejection sampling relative risk relevel function remainder rep() repeated multiples replicate() reproducible analysis resampling based inference reshape reshape package reshape() retain revision control systems rexp() Richard Heiberger Rick Wicklin Robert Allison rjags rnorm() robustness Rocker Rosettacode round function rounding rowMeans() function Royal Statistical Society rpart package RStudio runif() running average sample() sapply() SAS SAS data sets SAS formats SAS macro sas tricks SAS-x sas7bdat format sas7bdat package SAT scores save data in Stata format saving output from SAS scatterplot sciplot package SD card sd() seeds sensitivity sequences set ds; by x; set statement options set.seed() shading regions Shangri La shrinkage estimator shuffle() shuffle() function side by side histograms Simpson's paradox simulate data simulation studies skewness skewness() sleep apnea smoothScatter() snowstorms social networks social science sort sports statistics spreadsheet spreadsheets standard deviation standardized regression coefficients Stata statistical education Statistical Sleuth statistics education Stein estimator stratiification string functions string manipulation Stuart Lipsitz style guide subset subsetting substitute function substr summary statistics summary() sunflowerplot() survey sampling survival analysis survival model survival package symbol statement symbolic computation t-test t() t() function table() tables Task Views teacher salaries teaching statistics test statement text() textConnection() Thomas Lumley Tick marks tidying code tidyr Tim Hesterberg time series time-varying covariates title statement titles transpose truncated distribution ts() two sample comparisons Type I error rate type="n" unaggregated datasets University of Auckland unobserved class variable number of records variance vectors version control vref optioncall symput Wald CI Weibull weight statement where function which.min() function while() wide to narrow Wilcoxon rank-sum test Wilson estimator WinBUGS with() within() Wolfram Alpha World Statistics Day write Excel sheets writeXLS package xchisq.test() xckd Xin Wei ylim option