Change-Point Problems
Last update: 07 Jul 2025 13:49
First version: 23 December 2011, major expansion 16 March 2020
Suppose you have a time series which has some
(stochastic) property you're interested in, say its expected value or its
variance. You think this is usually constant, but that if it does change, it
does so abruptly. You would like to know if and when it changes, and perhaps
to localize the time when it did. You now have a change-point problem.
In an "offline" setting, where you've collected all you're data and want to
analyze it in retrospect, there are fairly straightforward approaches,
especially if you are willing to invoke parametric models. We imagine that
we've got data \( X_1, X_2, \ldots X_T \equiv X_{1:T} \), and that if there's no
change point then these follow some known probability distribution \( p(x_{1:T};
\theta_0) \). If on the other hand there is a break after time \( t \), then the
distribution is \( p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1) \). (If you really
want to condition the after-break stuff on the pre-break stuff go ahead.) The
Neyman-Pearson lemma tells us
that the optimal test is the
likelihood ratio test, so we should locate the break-point at \( t \) if the log likelihood ratio
\[
\log{\frac{p(x_{1:t};\theta_0) p(x_{t+1:T};\theta_1)}{p(x_{1:T};\theta_0)}}
\]
is sufficiently large. If neither the pre-change nor post-change parameters
are exactly known, we can often still get good results by substituting in their
maximum likelihood estimators (or other consistent, efficient estimators), say
\[
\log{\frac{p(x_{1:t};\hat{\theta}_0) p(x_{t+1:T};\hat{\theta}_1)}{p(x_{1:T};\tilde{\theta}_0)}}
\]
If the basic parametric family has \( d \) continuous parameters, then the
version with the change point at time \( t \) has \( 2d \) continuous
parameters, and we'd typically get a \( \chi^2_{2d-d} = \chi^2_{d} \)
distribution for the distribution of the log likelihood ratio under the null
hypothesis of no change. (Notice that the estimate of \( \theta_0 \) should be
different under the null of no change and the alternative of a change at \( t
\), so I've given the different estimators different accent marks.)
We usually also don't just care about one particular possible
change point \( t \), but rather want to consider a range as possibilities.
This will often be handled by maximizing the likelihood over different possible
values of \( t \). Since that's not a continuous parameter, the usual \(
\chi^2 \) theory doesn't apply.
Recommended:
- Sylvain Arlot and Alain Celisse, "Segmentation of the mean of heteroscedastic data via cross-validation", Statistics and Computing
21 (2011): 613--632, arxiv:0902.3977 [MATLAB
code]
- Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky,
"A sticky HDP-HMM with application to speaker diarization",
Annals of Applied Statistics 5 (2011): 1020--1056,
arxiv:0905.2592
- Daniil Ryabko and Boris Ryabko, "Testing Statistical Hypotheses
About Ergodic
Processes", arxiv:0804.0510
[Appears to be the same as their "Nonparametric Statistical Inference for
Ergodic
Processes", IEEE
Transactions on Information Theory 56 (2010):
1430--1435
- Wenguang Sun, T. Tony Cai, "Large-scale multiple testing under dependence", Journal of the Royal Statistical
Society B 71 (2008): 393--424
- Albert Vexler, "Martingale Type Statistics Applied to Change Point
Detection", Communications in Statistics - Theory and Methods
37 (2008): 1207--1224
Recommended, historical:
- G. A. Barnard, "Control Charts and Stochastic Processes",
Journal of the Royal Statistical Society B 21 (1959): 239--271 [JSTOR]
- P. J. Harrison and C. F. Stevens, "A Bayesian Approach to Short-term Forecasting", Operational Research Quarterly 22 (1971): 341-–362 [A reasonable approach, based on Kalman smoothing, with a wild application unintended by the authors]
- E. S. Page, "Continuous Inspection Schemes", Biometrika
41 (1954): 100--115 [JSTOR. Origin of the famous "cumulative sum" ("CUSUM") procedure
for finding breaks.]
- A. Wald and J. Wolfowitz, "Optimum Character of the Sequential Probability Ratio Test", Annals of Mathematical Statistics 19 (1948): 326--339
To read:
- Alexander Aue and Lajos Horváth, "Structural breaks in time series", Journal of Time Series Analysis 34 (2013): 1--16
- Boris Brodsky and Boris Darkhovsky, "Sequential change-point detection for mixing random sequences under composite hypotheses", Statistical Inference for Stochastic Processes 11 (2008): 35--54
- S. Camargo, S. M. Duarte Quieros and C. Anteneodo, "Nonparametric segmentation of nonstationary time series", Physical Review E 84 (2011): 046702
- Cheng-Der Fuh
- Noah D. Gade, Jordan Rodu, "Change Point Detection with Conceptors", arxiv:2308.06213
- Samir Ben Hariz, Jonathan J. Wylie, Qiang Zhang, "Optimal rate of convergence for nonparametric change-point estimators for nonstationary sequences", Annals of Statistics 35 (2007): 1802--1826, arxiv:0710.4217
- Heping He and Thomas A. Severini, "Asymptotic properties of maximum likelihood estimators in models with multiple change points", Bernoulli 16 (2010): 759--779, arxiv:1102.5224
- C. T. Jose, B. Ismail, S. Jayasekhar, "Trend, Growth Rate, and Change Point Analysis: A Data Driven Approach", Communications in Statistics: Simulation and Computation 37 (2008): 498--506
- Azadeh Khaleghi, Daniil Ryabko, "Multiple Change-Point Estimation in Stationary Ergodic Time-Series", arxiv:1203.1515
- Shiqing Ling, "Testing for change points in time series models and limiting theorems for NED sequences", Annals of Statistics 35 (2007): 1213--1237, arxiv:0708.2369
- George V. Moustakides, "Sequential change detection
revisited", arxiv:0804.0741
= Annals of Statistics 36 (2008): 787--807
- Chun Yip Yau and Richard A. Davis, "Likelihood inference for discriminating between long-memory and change-point models", Journal of Time Series Analysis
33 (2012): 649--664