Sufficient Statistics
Last update: 21 Apr 2025 21:17
First version:
In statistical theory, a "statistic" is a well-behaved (i.e., "measurable")
function of the data, which is what's actually used in calculations or
inferences, rather than the full data set. E.g., the sample mean, the sample
median, the sample variance, etc. A statistic is sufficient if it is
just as informative as the full data. The concept was introduced by
R. A. Fisher in the 1920s, and refined by Jerzy Neyman in the 1930s.
Parametric sufficiency means that the statistic contains just as much
information about (some) parameter of the model as the full data. More
precisely: the actual data has a certain probability distribution conditional
on the data, which in general will also involve the parameter. The statistic
is sufficient if this conditional distribution is the same for all
parameter values. (That's actually clearer in algebra but I don't feel up to
writing it in HTML now.) Once we've controlled for the sufficient statistic,
nothing else --- not even the original data --- can tell us anything more about
the parameter. Predictive sufficiency is similar: given the predictively
sufficient statistic, future observations can be predicted as well as if the
whole past was available. Predictive sufficiency can be expressed concisely in
terms of mutual information.
A necessary statistic is one which can be computed from any
sufficient statistic, without reference to the original data. (It's
"necessary" in the sense that any optimal inference implicitly involves knowing
the necessary statistic.) Under pretty general conditions, maximum likelihood
estimates are necessary statistics, though they are not always sufficient. A
minimal sufficient statistic is one which is both necessary and
sufficient --- i.e., it's just as informative as the original data, but it can
be computed from any other sufficient statistic; no further compression of the
data is possible, without losing some information.
A lot of my work has involved describing and finding predictively sufficient
statistics for time series and spatio-temporal processes. It turns out that
the statistical sufficiency property gives rise to a Markov property for the
statistics. (Basically, computational
mechanics turns out to be about constructive predictively sufficient
statistics.) So I'm very interested in sufficiency in general, and especially
how it relates to Markovian representations of non-Markovian processes.
Topics of particular interest: Necessary and sufficient conditions for the
existence of non-trivial sufficient statistics; dimensionality of sufficient
statistics; geometric and probabilistic characterizations; decision-theoretic
properties; necessary statistics; minimal sufficient statistics
for transducers; connections
to causal inference; relationship
between sufficiency and ergodic theory;
characterization of different classes of stochastic processes in terms
of their sufficient statistics; exponential families.
Recommended, big picture:
- Sufficiency is a very important topic in statistical inference, and
any good book on theoretical statistics will cover it in depth. I like
Mark Schervish's Theory of Statistics , but really any one will do.
- Persi Diaconis, "Sufficency as Statistical Symmetry", Proceedings of the AMS Centennial Symposium 15--26 [1988; PDF]
- E. B. Dynkin, "Sufficient statistics and extreme
points", Annals of Probability 6 (1978): 705--730
["The connection between
ergodic decompositions and sufficient
statistics is explored in an elegant paper by DYNKIN" ---
Kallenberg, Foundations of Modern Probability, p. 577.]
Recommended, close ups:
- R. R. Bahadur, "Sufficiency and statistical decision functions,"
Annals of Mathematical Statistics 25 (1954):
423--462
- M. S. Bartlett
- "Statistical Information and Properties of Sufficiency",
Proceedings of the Royal Society of London A 154
(1936): 124--137 [JSTOR]
- "Properties of Sufficiency and Statistical
Tests", Proceedings of the Royal Society of London A 160
(1937): 268--282 [JSTOR]
- David Blackwell and M. A. Girshick, Theory of Games and
Statistical Decisions [Blackwell was a pioneer in exploring the
decision-theoretic properties of sufficiency, and this excellent old book
contains many deep theorems in this area]
- Ronald W. Butler, "Predictive Likelihood Inference with
Applications", Journal of the Royal Statistical Society
B 48 (1986): 1--38 ["in the predictive setting, all parameters
are nuisance
parameters". JSTOR]
- John W. Fisher III, Alexander T. Ihler and Paula A. Viola,
"Learning Informative Statistics: A Nonparametric Approach", pp. 900--906 in
NIPS 12 (1999) [PDF
reprint. I'd call this more of a semi-parametric approach than a fully
non-parametric one; they assume a parametric form for the dependence structure,
but are agnostic about the distributions of innovations, and so try to maximize
non-parametrically estimated mutual informations. In the limit, this will give
them sufficient statistics.]
- R. A. Fisher
- "A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error",
Monthly Notices of the Royal Astronomical
Society 80 (1920): 758--770 [Apparently the first time
the sufficiency property was noted, though Fisher does not use that term
here. PDF]
- "On the Mathematical Foundations of Theoretical Statistics",
Philosophical Transactions of the Royal
Society A 222 (1922): 309--368 [Formal introduction of
the concept, and the name, of sufficiency, along with much else that has proved
fundamental to statistics, such as the likelihood function and the method of
maximum likelihood. PDF in two
parts, 1, 2]
- "Theory of Statistical Estimation", Proceedings of
the Cambridge Philosophical Society 22 (1925): 700--725
[Often, but mistakenly, cited in place of the 1922 paper; admittedly, clearer.
PDF]
- Solomon Kullback, Information Theory and Statistics
- Solomon Kullback and R. A. Leibler, "On Information and
Sufficiency",
Annals of Mathematical Statistics 22 (1951): 79--86
- Rudolf Kulhavy, Recursive Nonlinear Estimation: A Geometric
Approach
- Steffen L. Lauritzen
- Extremal Families and Systems of Sufficient
Statistics [Mini-review.]
- "Extreme Point Models in Statistics",
Scandinavian Journal of Statistics 11 (1984):
65--91 [Highlights of the book, without proofs but with decent typography.
With useful discussion and a
reply. JSTOR]
- "Sufficiency, Prediction and Extreme Models",
Scandinavian Journal of Statistics 1 (1974):
128--134 [JSTOR]
- "On the Interrelationships among Sufficiency,
Total Sufficiency, and Some Related Concepts", Preprint 8, Institute of Mathematical Statistics, University of Copenhagen (July 1974) [PDF scan
via Prof. Lauritzen]
- Benoit Mandelbrot, "The Role of Sufficiency and of Estimation in
Thermodynamics", Annals
of Mathematical Statistics 33 (1962): 1021--1038
[Extensive thermodynamic variables as sufficient statistics for the conjugate
intensive variables; Gibbs canonical form arising from natural requirements on
finite-dimensional sufficient statistics, which can only be achieved for
exponential families of probability distributions. Very clever, and IMHO a
real contribution to the foundations of
statistical mechanics and thermodynamics.]
- Giorgio Picci, "Some Connections Between the Theory of Sufficient Statistics and the Identifiability Problem", SIAM Journal on Applied
Mathematics 33 (1977): 383--398 [Introduces the idea of
a "maximal identifiable statistic" --- the coarsest partition of hypothesis
space where each equivalence class/cell of the partition gives rise to
a distinct distribution of observables. (I would prefer "parameter",
rather than "statistic", since it's a function of the distribution, not the
observables, but that's a quibble.) It might be interesting to try to
define emergence in these terms ---
perhaps as a restriction on the observable sigma-field such that the
equivalence classes of the maximal identifiable parameter become
infinite-dimensional, or something like
that. JSTOR. Thanks to
Rhiannon Weaver for the pointer.]
- David Pollard, "A note on insufficiency and the preservation of Fisher information", arxiv:1107.3797
- Ge Xu, Biao Chen, "The Sufficiency Principle for Decentralized Data Reduction", arxiv:1207.3265
To read:
- Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhüfer, "Information geometry and sufficient statistics", arxiv:1207.6736
- T. Bohlin, "Information pattern for linear discrete-time models
with stochastic coefficients," IEEE Transactions on Automatic
Control 15 (1970): 104--106 [On recursively-computable
sufficient statistics]
- R. Dennis Cook, Liliana Forzani, and Adam J. Rothman, "Estimating sufficient reductions of the predictors in abundant high-dimensional regressions", Annals of Statistics 40 (2012): 353--384
- E. B. Dynkin, "Necessary and sufficient statistics for a family of
probability distributions," Uspekhi maetm. nauk
6 (1951): 68--90 [Apparently translated
in Select. Trans. Math. Statist. Prob. 1 (1951):
23--41. Zacks, below, is supposed to follow closely]
- David Hinkley, "Predictive Likelihood", Annals of Statistics 7 (1979): 718--728
- V. S. Huzurbazar, Sufficient Statistics: Selected
Contributions
- Anna Jencova and Denes Petz, "Suffificiency in quantum statistical
inference", math-ph/0412093
- Kuang-Yao Lee, Bing Li, and Francesca Chiaromonte, "A general theory for nonlinear sufficient dimension reduction: Formulation and estimation",
Annals of Statistics 41 (2013): 221--249, arxiv:1304.0580
- Yanyuan Ma and Liping Zhu, "Efficient estimation in sufficient dimension reduction", Annals of Statistics 41 (2013): 250--268
- W. J. Runggaldier and F. Spizzichino, "Sufficient conditions for
finite dimensionality of filters in discrete time: A Laplace transform-based
approach," Bernoulli 7 (2001): 211--221
- Morris Skibinsky, "Adequate Subfields and Sufficiency",
Annals of Mathematical Statistics 38 (1967): 155--161
- Taiji Suzuki and Masashi Sugiyama, "Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation", Neural Computation 25 (2013): 725--758
- Andrew Tausz, "Properties of Conditional Expectation Operators and Sufficient Subfields", arxiv:1011.5162
- Brendan van Rooyen, Robert C. Williamson, "Le Cam meets LeCun: Deficiency and Generic Feature Learning", arxiv:1402.4884
- Tao Wang, Xu Guo, Peirong Xu, Lixing Zhu, "Transformed sufficient dimension reduction", arxiv:1401.0267
- Makoto Yamada, Gang Niu, Jun Takagi, Masashi Sugiyama, "Sufficient Component Analysis for Supervised Dimension Reduction", arxiv:1103.4998
- S. Zacks, The Theory of Statistical Inference [For
material on necessary and sufficient statistics]