1Department of Electrical Engineering and Computer Science, University of California, Berkeley
2Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
3Department of Statistics, Stanford University
4Stanford Graduate School of Business
Abstract
We introduce a framework for calibrating machine learning models to satisfy finite-sample statistical guarantees. Our calibration algorithms work with any model and (unknown) data-generating distribution and do not require retraining. The algorithms address, among other examples, false discovery rate control in multilabel classification, intersection-over-union control in instance segmentation, and simultaneous control of the type-1 outlier error and confidence set coverage in classification or regression. Our main insight is to reframe risk control as multiple hypothesis testing, enabling different mathematical arguments. We demonstrate our algorithms with detailed worked examples in computer vision and tabular medical data. The computer vision experiments demonstrate the utility of our approach in calibrating state-of-the-art predictive architectures that have been deployed widely, such as the object detection system.
Funding Statement
This work was supported in part by the Mathematical Data Science program of the Office of Naval Research under grant number N00014-21-1-2840.
Acknowledgments
The authors would like to thank the anonymous referees, an Associate Editor, and the Editor for their constructive comments that improved the quality of this paper. Lihua Lei is grateful for the support of National Science Foundation Grant DMS-2338464.
Citation
Anastasios N. Angelopoulos. Stephen Bates. Emmanuel J. Candès. Michael I. Jordan. Lihua Lei. "Learn then test: Calibrating predictive algorithms to achieve risk control." Ann. Appl. Stat. 19 (2) 1641 - 1662, June 2025. https://doi.org/10.1214/24-AOAS1998
Information