A practical solution to the pervasive problems ofp values

Wagenmakers, Eric-Jan

doi:10.3758/BF03194105

A practical solution to the pervasive problems ofp values

Theoretical and Review Articles
Published: October 2007

Volume 14, pages 779–804, (2007)
Cite this article

Download PDF

Psychonomic Bulletin & Review Aims and scope Submit manuscript

Eric-Jan Wagenmakers ¹

18k Accesses
2238 Citations
45 Altmetric
3 Mentions
Explore all metrics

Abstract

In the field of psychology, the practice ofp value null-hypothesis testing is as widespread as ever. Despite this popularity, or perhaps because of it, most psychologists are not aware of the statistical peculiarities of thep value procedure. In particular,p values are based on data that were never observed, and these hypothetical data are themselves influenced by subjective intentions. Moreover,p values do not quantify statistical evidence. This article reviews thesep value problems and illustrates each problem with concrete examples. The three problems are familiar to statisticians but may be new to psychologists. A practical solution to thesep value problems is to adopt a model selection perspective and use the Bayesian information criterion (BIC) for statistical inference (Raftery, 1995). The BIC provides an approximation to a Bayesian hypothesis test, does not require the specification of priors, and can be easily calculated from SPSS output.

Article PDF

[フレーム]

An Alternative to p-Values in Hypothesis Testing with Applications in Model Selection of Stock Price Data

Hypothesis Testing Within Bayesian Inference

Bayesian prediction intervals for assessing P-value variability in prospective replication studies

Article Open access 08 December 2017

Discover the latest articles, books and news in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control,19, 716–723.
Article Google Scholar
Anscombe, F. J. (1954). Fixed-sample-size analysis of sequential observations.Biometrics,10, 89–100.
Article Google Scholar
Anscombe, F. J. (1963). Sequential medical trials.Journal of the American Statistical Association,58, 365–383.
Article Google Scholar
Armitage, P. (1957). Restricted sequential procedures.Biometrika,44, 9–26.
Google Scholar
Armitage, P. (1960).Sequential medical trials. Springfield, IL: Thomas.
Armitage, P., McPherson, C. K., &Rowe, B. C. (1969). Repeated significance tests on accumulating data.Journal of the Royal Statistical Society: Series A,132, 235–244.
Article Google Scholar
Bakan, D. (1966). The test of significance in psychological research.Psychological Bulletin,66, 423–437.
Article PubMed Google Scholar
Barnard, G. A. (1947). The meaning of a significance level.Biometrika,34, 179–182.
Google Scholar
Basu, D. (1964). Recovery of ancillary information.Sankhya: Series A,26, 3–16.
Google Scholar
Bayarri, M.-J., &Berger, J. O. (2004). The interplay of Bayesian and frequentist analysis.Statistical Science,19, 58–80.
Article Google Scholar
Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.
Google Scholar
Berger, J. O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing?Statistical Science,18, 1–32.
Article Google Scholar
Berger, J. O., &Berry, D. A. (1988a). The relevance of stopping rules in statistical inference. In S. S. Gupta & J. O. Berger (Eds.),Statistical decision theory and related topics IV (Vol. 1, pp. 29–72). New York: Springer.
Google Scholar
Berger, J. O., &Berry, D. A. (1988b). Statistical analysis and the illusion of objectivity.American Scientist,76, 159–165.
Google Scholar
Berger, J. O., Boukai, B., &Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion).Statistical Science,12, 133–160.
Article Google Scholar
Berger, J. O., Brown, L., &Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential hypothesis testing.Annals of Statistics,22, 1787–1807.
Article Google Scholar
Berger, J. O., &Delampady, M. (1987). Testing precise hypotheses.Statistical Science,2, 317–352.
Article Google Scholar
Berger, J. O., &Mortera, J. (1999). Default Bayes factors for nonnested hypothesis testing.Journal of the American Statistical Association,94, 542–554.
Article Google Scholar
Berger, J. O., &Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction.Journal of the American Statistical Association,91, 109–122.
Article Google Scholar
Berger, J. O., &Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence.Journal of the American Statistical Association,82, 112–139.
Article Google Scholar
Berger, J. O., &Wolpert, R. L. (1988).The likelihood principle (2nd ed.). Hayward, CA: Institute of Mathematical Statistics.
Google Scholar
Bernardo, J. M., &Smith, A. F. M. (1994).Bayesian theory. Chichester, U.K.: Wiley.
Book Google Scholar
Birnbaum, A. (1962). On the foundations of statistical inference (with discussion).Journal of the American Statistical Association,53, 259–326.
Google Scholar
Birnbaum, A. (1977). The Neyman—Pearson theory as decision theory, and as inference theory; with a criticism of the Lindley—Savage argument for Bayesian theory.Synthese,36, 19–49.
Article Google Scholar
Box, G. E. P., &Tiao, G. C. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.
Google Scholar
Browne, M. (2000). Cross-validation methods.Journal of Mathematical Psychology,44, 108–132.
Article PubMed Google Scholar
Burdette, W. J., &Gehan, E. A. (1970).Planning and analysis of clinical studies. Springfield, IL: Thomas.
Google Scholar
Burnham, K. P., &Anderson, D. R. (2002).Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.
Google Scholar
Busemeyer, J. R., &Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task.Psychological Assessment,14, 253–262.
Article PubMed Google Scholar
Christensen, R. (2005). Testing Fisher, Neyman, Pearson, and Bayes.American Statistician,59, 121–126.
Article Google Scholar
Cohen, J. (1994). The earth is round (p <.05).American Psychologist,49, 997–1003.
Article Google Scholar
Cornfield, J. (1966). Sequential trials, sequential analysis, and the likelihood principle.American Statistician,20, 18–23.
Article Google Scholar
Cornfield, J. (1969). The Bayesian outlook and its application.Biometrics,25, 617–657.
Article PubMed Google Scholar
Cortina, J. M., &Dunlap, W. P. (1997). On the logic and purpose of significance testing.Psychological Methods,2, 161–172.
Article Google Scholar
Cox, D. R. (1958). Some problems connected with statistical inference.Annals of Mathematical Statistics,29, 357–372.
Article Google Scholar
Cox, D. R. (1971). The choice between alternative ancillary statistics.Journal of the Royal Statistical Society: Series B,33, 251–255.
Google Scholar
Cox, R. T. (1946). Probability, frequency and reasonable expectation.American Journal of Physics,14, 1–13.
Article Google Scholar
Cumming, G. (2007). Replication and p values: p values predict the future vaguely, but confidence intervals do better. Manuscript submitted for publication.
D’Agostini, G. (1999). Teaching statistics in the physics curriculum: Unifying and clarifying role of subjective probability.American Journal of Physics,67, 1260–1268.
Article Google Scholar
Dawid, A. P. (1984). Statistical theory: The prequential approach.Journal of the Royal Statistical Society: Series A,147, 278–292.
Article Google Scholar
De Finetti, B. (1974).Theory of probability: A critical introductory treatment (Vols. 1 & 2; A. Machí & A. Smith, Trans.). London: Wiley.
Google Scholar
Diamond, G. A., &Forrester, J. S. (1983). Clinical trials and statistical verdicts: Probable grounds for appeal.Annals of Internal Medicine,98, 385–394.
PubMed Google Scholar
Dickey, J. M. (1973). Scientific reporting and personal probabilities: Student’s hypothesis.Journal of the Royal Statistical Society: Series B,35, 285–305.
Google Scholar
Dickey, J. M. (1977). Is the tail area useful as an approximate Bayes factor?Journal of the American Statistical Association,72, 138–142.
Article Google Scholar
Dixon, P. (2003). The p value fallacy and how to avoid it.Canadian Journal of Experimental Psychology,57, 189–202.
PubMed Google Scholar
Djurić, P. M. (1998). Asymptotic MAP criteria for model selection.IEEE Transactions on Signal Processing,46, 2726–2735.
Article Google Scholar
Edwards, A. W. F. (1992).Likelihood. Baltimore: Johns Hopkins University Press.
Google Scholar
Edwards, W., Lindman, H., &Savage, L. J. (1963). Bayesian statistical inference for psychological research.Psychological Review,70, 193–242.
Article Google Scholar
Efron, B. (2005). Bayesians, frequentists, and scientists.Journal of the American Statistical Association,100, 1–5.
Article Google Scholar
Efron, B., &Tibshirani, R. (1997). Improvements on cross-validation: The.6321 bootstrap method.Journal of the American Statistical Association,92, 548–560.
Article Google Scholar
Feller, W. (1940). Statistical aspects of ESP.Journal of Parapsychology,4, 271–298.
Google Scholar
Feller, W. (1970).An introduction to probability theory and its applications: Vol. 1 (2nd ed.). New York: Wiley.
Google Scholar
Fine, T. L. (1973).Theories of probability: An examination of foundations. New York: Academic Press.
Google Scholar
Firth, D., &Kuha, J. (1999). Comments on "A critique of the Bayesian information criterion for model selection."Sociological Methods & Research,27, 398–402.
Article Google Scholar
Fisher, R. A. (1934).Statistical methods for research workers (5th ed.). London: Oliver & Boyd.
Google Scholar
Fisher, R. A. (1935a).The design of experiments. Edinburgh: Oliver & Boyd.
Google Scholar
Fisher, R. A. (1935b). The logic of inductive inference (with discussion).Journal of the Royal Statistical Society,98, 39–82.
Article Google Scholar
Fisher, R. A. (1958).Statistical methods for research workers (13th ed.). New York: Hafner.
Google Scholar
Freireich, E. J., Gehan, E., Frei, E., III,Schroeder, L. R., Wolman, I. J., Anbari, R., et al. (1963). The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy.Blood,21, 699–716.
Google Scholar
Frick, R. W. (1996). The appropriate use of null hypothesis testing.Psychological Methods,1, 379–390.
Article Google Scholar
Friedman, L. M., Furberg, C. D., &DeMets, D. L. (1998).Fundamentals of clinical trials (3rd ed.). New York: Springer.
Google Scholar
Galavotti, M. C. (2005).A philosophical introduction to probability. Stanford: CSLI Publications.
Google Scholar
Geisser, S. (1975). The predictive sample reuse method with applications.Journal of the American Statistical Association,70, 320–328.
Article Google Scholar
Gelman, A., &Rubin, D. B. (1999). Evaluating and using statistical methods in the social sciences.Sociological Methods & Research,27, 403–410.
Article Google Scholar
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.),A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum.
Google Scholar
Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals.Behavioral & Brain Sciences,21, 199–200.
Article Google Scholar
Gilks, W. R., Richardson, S., &Spiegelhalter, D. J. (Eds.) (1996).Markov chain Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.
Google Scholar
Gill, J. (2002).Bayesian methods: A social and behavioral sciences approach. Boca Raton, FL: CRC Press.
Google Scholar
Glover, S., &Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists.Psychonomic Bulletin & Review,11, 791–806.
Google Scholar
Good, I. J. (1983).Good thinking: The foundations of probability and its applications. Minneapolis: University of Minnesota Press.
Google Scholar
Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.),Bayesian statistics 2: Proceedings of the Second Valencia International Meeting, September 6/10, 1983 (pp. 249–269). Amsterdam: North-Holland.
Google Scholar
Goodman, S. N. (1993).p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate.American Journal of Epidemiology,137, 485–496.
PubMed Google Scholar
Grünwald, P. [D.] (2000). Model selection based on minimum description length.Journal of Mathematical Psychology,44, 133–152.
Article PubMed Google Scholar
Grünwald, P. D., Myung, I. J., &Pitt, M. A. (Eds.) (2005).Advances in minimum description length: Theory and applications. Cambridge, MA: MIT Press.
Google Scholar
Hacking, I. (1965).Logic of statistical inference. Cambridge: Cambridge University Press.
Google Scholar
Hagen, R. L. (1997). In praise of the null hypothesis statistical test.American Psychologist,52, 15–24.
Article Google Scholar
Haldane, J. B. S. (1945). On a method of estimating frequencies.Biometrika,33, 222–225.
Article PubMed Google Scholar
Hannan, E. J. (1980). The estimation of the order of an ARMA process.Annals of Statistics,8, 1071–1081.
Article Google Scholar
Helland, I. S. (1995). Simple counterexamples against the conditionality principle.American Statistician,49, 351–356.
Article Google Scholar
Hill, B. M. (1985). Some subjective Bayesian considerations in the selection of models.Econometric Reviews,4, 191–246.
Article Google Scholar
Howson, C., &Urbach, P. (2005).Scientific reasoning: The Bayesian approach (3rd. ed.). Chicago: Open Court.
Google Scholar
Hubbard, R., &Bayarri, M.-J. (2003). Confusion over measures of evidence (p’s) versus errors (a’s) in classical statistical testing.American Statistician,57, 171–182.
Article Google Scholar
Jaynes, E. T. (1968). Prior probabilities.IEEE Transactions on Systems Science & Cybernetics,4, 227–241.
Article Google Scholar
Jaynes, E. T. (2003).Probability theory: The logic of science. Cambridge: Cambridge University Press.
Book Google Scholar
Jeffreys, H. (1961).Theory of probability. Oxford: Oxford University Press.
Google Scholar
Jennison, C., &Turnbull, B. W. (1990). Statistical approaches to interim monitoring of medical trials: A review and commentary.Statistical Science,5, 299–317.
Article Google Scholar
Kadane, J. B., Schervish, M. J., &Seidenfeld, T. (1996). Reasoning to a foregone conclusion.Journal of the American Statistical Association,91, 1228–1235.
Article Google Scholar
Karabatsos, G. (2006). Bayesian nonparametric model selection and model testing.Journal of Mathematical Psychology,50, 123–148.
Article Google Scholar
Kass, R. E. (1993). Bayes factors in practice.Statistician,42, 551–560.
Article Google Scholar
Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 377–395.
Google Scholar
Kass, R. E., &Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association,90, 928–934.
Article Google Scholar
Kass, R. E., &Wasserman, L. (1996). The selection of prior distributions by formal rules.Journal of the American Statistical Association,91, 1343–1370.
Article Google Scholar
Killeen, P. R. (2005a). An alternative to null-hypothesis significance tests.Psychological Science,16, 345–353.
Article PubMed Google Scholar
Killeen, P. R. (2005b). Replicability, confidence, and priors.Psychological Science,16, 1009–1012.
Article PubMed Google Scholar
Killeen, P. R. (2006). Beyond statistical inference: A decision theory for science.Psychonomic Bulletin & Review,13, 549–562.
Google Scholar
Klugkist, I., Laudy, O., &Hoijtink, H. (2005). Inequality constrained analysis of variance: A Bayesian approach.Psychological Methods,10, 477–493.
Article PubMed Google Scholar
Lee, M. D. (2002). Generating additive clustering models with limited stochastic complexity.Journal of Classification,19, 69–85.
Article Google Scholar
Lee, M. D., &Pope, K. J. (2006). Model selection for the rate problem: A comparison of significance testing, Bayesian, and minimum description length statistical inference.Journal of Mathematical Psychology,50, 193–202.
Article Google Scholar
Lee, M. D., &Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: Comment on Trafimow (2003).Psychological Review,112, 662–668.
Article PubMed Google Scholar
Lee, P. M. (1989).Bayesian statistics: An introduction. New York: Oxford University Press.
Google Scholar
Lindley, D. V. (1957). A statistical paradox.Biometrika,44, 187–192.
Google Scholar
Lindley, D. V. (1972).Bayesian statistics: A review. Philadelphia: Society for Industrial & Applied Mathematics.
Google Scholar
Lindley, D. V. (1977). The distinction between inference and decision.Synthese,36, 51–58.
Article Google Scholar
Lindley, D. V. (1982). Scoring rules and the inevitability of probability.International Statistical Review,50, 1–26.
Article Google Scholar
Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine.Teaching Statistics,15, 22–25.
Article Google Scholar
Lindley, D. V. (2004). That wretched prior.Significance,1, 85–87.
Article Google Scholar
Lindley, D. V., &Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view).American Statistician,30, 112–119.
Article Google Scholar
Lindley, D. V., &Scott, W. F. (1984).New Cambridge elementary statistical tables. Cambridge: Cambridge University Press.
Google Scholar
Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data.Current Directions in Psychological Science,5, 161–171.
Article Google Scholar
Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler (Ed. in Chief) & J. Wixted (Vol. Ed.),Stevens’ Handbook of experimental psychology: Vol. 4. Methodology in experimental psychology (3rd ed., pp. 339–390). New York: Wiley.
Google Scholar
Ludbrook, J. (2003). Interim analyses of data as they accumulate in laboratory experimentation.BMC Medical Research Methodology,3, 15.
Article PubMed Google Scholar
McCarroll, D., Crays, N., &Dunlap, W. P. (1992). Sequential ANOVAs and Type I error rates.Educational & Psychological Measurement,52, 387–393.
Article Google Scholar
Myung, I. J. (2000). The importance of complexity in model selection.Journal of Mathematical Psychology,44, 190–204.
Article PubMed Google Scholar
Myung, I. J., Forster, M. R., & Browne, M. W. (Eds.) (2000). Model selection [Special issue].Journal of Mathematical Psychology,44(1).
Myung, I. J., Navarro, D. J., &Pitt, M. A. (2006). Model selection by normalized maximum likelihood.Journal of Mathematical Psychology,50, 167–179.
Article Google Scholar
Myung, I. J., &Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A Bayesian approach.Psychonomic Bulletin & Review,4, 79–95.
Google Scholar
Nelson, N., Rosenthal, R., &Rosnow, R. L. (1986). Interpretation of significance levels and effect sizes by psychological researchers.American Psychologist,41, 1299–1301.
Article Google Scholar
Neyman, J. (1977). Frequentist probability and frequentist statistics.Synthese,36, 97–131.
Article Google Scholar
Neyman, J., &Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society: Series A,231, 289–337.
Article Google Scholar
Nickerson, R. S. (2000). Null hypothesis statistical testing: A review of an old and continuing controversy.Psychological Methods,5, 241–301.
Article PubMed Google Scholar
O’Hagan, A. (1997). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.
Google Scholar
O’Hagan, A. (2004). Dicing with the unknown.Significance,1, 132–133.
Article Google Scholar
O’Hagan, A., &Forster, J. (2004).Kendall’s advanced theory of statistics: Vol. 2B. Bayesian inference (2nd ed.). London: Arnold.
Google Scholar
Pauler, D. K. (1998). The Schwarz criterion and related methods for normal linear models.Biometrika,85, 13–27.
Article Google Scholar
Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., et al. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient: I. Introduction and design.British Journal of Cancer,34, 585–612.
PubMed Google Scholar
Pitt, M. A., Myung, I. J., &Zhang, S. (2002). Toward a method of selecting among computational models of cognition.Psychological Review,109, 472–491.
Article PubMed Google Scholar
Pocock, S. J. (1983).Clinical trials: A practical approach. New York: Wiley.
Google Scholar
Pratt, J. W. (1961). [Review of Lehmann, E. L., Testing statistical hypotheses].Journal of the American Statistical Association,56, 163–167.
Article Google Scholar
Pratt, J. W. (1962). On the foundations of statistical inference: Discussion.Journal of the American Statistical Association,57, 314–315.
Google Scholar
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Newbury Park, CA: Sage.
Google Scholar
Raftery, A. E. (1995). Bayesian model selection in social research. In P. V. Marsden (Ed.),Sociological methodology 1995 (pp. 111–196). Cambridge, MA: Blackwell.
Google Scholar
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.),Markov chain Monte Carlo in practice (pp. 163–187). Boca Raton, FL: Chapman & Hall/CRC.
Google Scholar
Raftery, A. E. (1999). Bayes factors and BIC.Sociological Methods & Research,27, 411–427.
Article Google Scholar
Rissanen, J. (2001). Strong optimality of the normalized ML models as universal codes and information in data.IEEE Transactions on Information Theory,47, 1712–1717.
Article Google Scholar
Robert, C. P., &Casella, G. (1999).Monte Carlo statistical methods. New York: Springer.
Google Scholar
Rosenthal, R., &Gaito, J. (1963). The interpretation of levels of significance by psychological researchers.Journal of Psychology,55, 33–38.
Article Google Scholar
Rouder, J. N., &Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection.Psychonomic Bulletin & Review,12, 573–604.
Google Scholar
Rouder, J. N., Lu, J., Speckman, P., Sun, D., &Jiang, Y. (2005). A hierarchical model for estimating response time distributions.Psychonomic Bulletin & Review,12, 195–223.
Google Scholar
Royall, R. M. (1997).Statistical evidence: A likelihood paradigm. London: Chapman & Hall.
Google Scholar
Savage, L. J. (1954).The foundations of statistics. New York: Wiley.
Google Scholar
Schervish, M. J. (1996).P values: What they are and what they are not.American Statistician,50, 203–206.
Article Google Scholar
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.Psychological Methods,1, 115–129.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.
Article Google Scholar
Sellke, T., Bayarri, M.-J., &Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses.American Statistician,55, 62–71.
Article Google Scholar
Shafer, G. (1982). Lindley’s paradox.Journal of the American Statistical Association,77, 325–351.
Article Google Scholar
Siegmund, D. (1985).Sequential analysis: Tests and confidence intervals. New York: Springer.
Google Scholar
Smith, A. F. M., &Spiegelhalter, D. J. (1980). Bayes factors and choice criteria for linear models.Journal of the Royal Statistical Society: Series B,42, 213–220.
Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion).Journal of the Royal Statistical Society: Series B,36, 111–147.
Google Scholar
Strube, M. J. (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing.Behavior Research Methods,38, 24–27.
PubMed Google Scholar
Stuart, A., Ord, J. K., &Arnold, S. (1999).Kendall’s advanced theory of statistics: Vol. 2A. Classical inference and the linear model (6th ed.). London: Arnold.
Google Scholar
Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem.Psychological Review,110, 526–535.
Article PubMed Google Scholar
Vickers, D., Lee, M. D., Dry, M., &Hughes, P. (2003). The roles of the convex hull and the number of potential intersections in performance on visually presented traveling salesperson problems.Memory & Cognition,31, 1094–1104.
Google Scholar
Wagenmakers, E.-J. (2003). How many parameters does it take to fit an elephant? [Book review].Journal of Mathematical Psychology,47, 580–586.
Article Google Scholar
Wagenmakers, E.-J., &Farrell, S. (2004). AIC model selection using Akaike weights.Psychonomic Bulletin & Review,11, 192–196.
Google Scholar
Wagenmakers, E.-J., &Grünwald, P. (2006). A Bayesian perspective on hypothesis testing: A comment on Killeen (2005).Psychological Science,17, 641–642.
Article PubMed Google Scholar
Wagenmakers, E.-J., Grünwald, P., &Steyvers, M. (2006). Accumulative prediction error and the selection of time series models.Journal of Mathematical Psychology,50, 149–166.
Article Google Scholar
Wagenmakers, E.-J., Ratcliff, R., Gomez, P., &Iverson, G. J. (2004). Assessing model mimicry using the parametric bootstrap.Journal of Mathematical Psychology,48, 28–50.
Article PubMed Google Scholar
Wagenmakers, E.-J., & Waldorp, L. (Eds.) (2006). Model selection: Theoretical developments and applications [Special issue].Journal of Mathematical Psychology,50(2).
Wainer, H. (1999). One cheer for null hypothesis significance testing.Psychological Methods,4, 212–213.
Article Google Scholar
Wallace, C. S., &Dowe, D. L. (1999). Refinements of MDL and MML coding.Computer Journal,42, 330–337.
Article Google Scholar
Ware, J. H. (1989). Investigating therapies of potentially great benefit: ECMO.Statistical Science,4, 298–340.
Article Google Scholar
Wasserman, L. (2000). Bayesian model selection and model averaging.Journal of Mathematical Psychology,44, 92–107.
Article PubMed Google Scholar
Wasserman, L. (2004).All of statistics: A concise course in statistical inference. New York: Springer.
Google Scholar
Weakliem, D. L. (1999). A critique of the Bayesian information criterion for model selection.Sociological Methods & Research,27, 359–397.
Article Google Scholar
Wilkinson, L., &the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations.American Psychologist,54, 594–604.
Article Google Scholar
Winship, C. (1999). Editor’s introduction to the special issue on the Bayesian information criterion.Sociological Methods & Research,27, 355–358.
Article Google Scholar
Xie, Y. (1999). The tension between generality and accuracy.Sociological Methods & Research,27, 428–435.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Methodology Unit, University of Amsterdam, Roetersstraat 15, 1018 WB, Amsterdam, The Netherlands
Eric-Jan Wagenmakers

Authors

Eric-Jan Wagenmakers
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Eric-Jan Wagenmakers.

Additional information

This research was supported by a Veni Grant from the Dutch Organization for Scientific Research (NWO). I thank Scott Brown, Peter Dixon, Simon Farrell, Raoul Grasman, Geoff Iverson, Michael Lee, Martijn Meeter, Jay Myung, Jeroen Raaijmakers, Jeff Rouder, and Rich Shiffrin for helpful comments on earlier drafts of this article. Mark Steyvers convinced me that this article would be seriously incomplete without a consideration of practical alternatives to thep value methodology.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wagenmakers, EJ. A practical solution to the pervasive problems ofp values. Psychonomic Bulletin & Review 14, 779–804 (2007). https://doi.org/10.3758/BF03194105

Download citation

Received: 28 June 2006
Accepted: 13 February 2007
Issue date: October 2007
DOI: https://doi.org/10.3758/BF03194105

A practical solution to the pervasive problems ofp values

Abstract

Article PDF

Similar content being viewed by others

An Alternative to p-Values in Hypothesis Testing with Applications in Model Selection of Stock Price Data

Hypothesis Testing Within Bayesian Inference

Bayesian prediction intervals for assessing P-value variability in prospective replication studies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

A practical solution to the pervasive problems ofp values

Abstract

Article PDF

Similar content being viewed by others

An Alternative to p-Values in Hypothesis Testing with Applications in Model Selection of Stock Price Data

Hypothesis Testing Within Bayesian Inference

Bayesian prediction intervals for assessing P-value variability in prospective replication studies

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords