Statistical Computing and Programming
Last update: 21 Apr 2025 21:17
First version:
By this I do not just mean R, but R is a big part of being a
working academic statistician these days...
R, for the record, is a free, open-source interpreted programming language
(and interactive environment) for statistical computing. It descends from a
language developed at Bell Labs (of blessed memory) called S. There is a
commercial descendant of S called S-plus, but I know of no reason to use it,
rather than R. For that matter, I know of no reason to use any of the
commercial statistical environments (Stata, SPSS, Minitab, ...) rather than R,
except for pesonal and organizational inertia. (Which is not to be slighted,
of course.) The only real alternative, from my point of view, is hand-written
code in something like C/C++ or Fortran --- which can of course be integrated
with R. It would be a bit unfair to say that seeing a new method
without an R implementation is cause for suspicion, but not wildly
unfair.
(And, of course, people who use Excel to do statistics are perhaps to be
pitied, but not to be taken seriously.)
— I am drawing a somewhat arbitrary terminological divide between
"statistical computing", meaning computing environments for statistical data
analysis, and "computational
statistics", meaning computational methods of special relevance to
statistical problems, or tricky or interesting computational problems arising
from statistical problems. (One might even call it "numerical methods for
statistics", except that some of the most relevant algorithms aren't very
numerical.) When I teach statistical computing, some of it is
computational statistics, and some of it is just plain programming, but lots of
it is stuff like data manipulation, and reproducibility of the analysis...
See also:
Statistics;
Teaching Statistics;
Programming;
Recommended, big picture:
- John M. Chambers, Software for Data Analysis: Programming
with R
Recommended, gentle introductions:
- W. John Braun and Duncan J. Murdoch, A First Course in
Statistical Programming with R [They're not kidding about being
a first course --- experienced programmers may find it irritatingly
slow-paced --- but they do a rather good job for total novices.]
- Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design
Recommended, tool sources:
- The R Project for Statistical
Computing
- Journal of Statistical Software
Recommended, close-ups of particular tools (very inadequate):
- Winson Change, R Graphics Cookbook
- Julian J. Faraway, Extending the Linear Model with R:
Generalized Linear, Mixed Effects and Nonparametric Regression Models
- Tristen Hayfield and Jeffrey S. Racine, "Nonparametric Econometrics: The np Package", Journal of Statistical Software 27 (2008): 5 [An extremely useful little R package]
- Michael Kane, John W. Emerson, Stephen Weston,
"Scalable Strategies for Computing with Massive Data",
Journal of Statistical
Software 55 (2013): 14
- Phil Spector, Data Manipulation with R
- Paul Teetor, R Cookbook
- Hadley Wickham, "The Split-Apply-Combine Strategy for Data Analysis", Journal of Statistical Software 40
(2011): 1
To read:
- Joseph Adler, R in a Nutshell [Glowing review in J. Stat. Soft.]
- Adrian W. Bowman and Adelchi Azzalini, Applied Smoothing
Techniques for Data Analysis: The Kernel Approach with S-Plus
Illustrations
- Richard Cotton, Learning R
- Garrett Grolemund, Hands-On Programming with R: Write Your Own Functions and Simulations
- Owen Jones and Robert Maillardet and Andrew Robinson, Introduction to Scientific Programming and Simulation Using R
- Ben Klemens, Modeling with Data: Tools and Techniques for
Scientific Computing [JSTOR; author's book site]
- Matthias Kohl and Peter Ruckdeschel, "R Package distrMod: S4
Classes and Methods for Probability
Models", Journal of
Statistical Software 35 (2010): 10 [Use this for
re-writing the power law code?]
- John Maidonald, Data Analysis and Graphics Using R
- Quinn E. McCallum, Parallel R
- Wes McKinney, Python for Data Analysis
- Maria L. Rizzo, Statistical Computing with R
- Hadley Wickham
- Advanced R
- ggplot2
- R Packages