Notebooks

Statistical Inference for Markov and Hidden Markov Models

Last update: 21 Apr 2025 21:17
First version: 28 September 2007

I am concerned here with inferring the parameters and/or the structure of the model, not with the estimation of the hidden state in a hidden Markov model with known parameters; that problem falls under filtering.

Parameter inference (what machine learning types would call "learning") given a known, fixed structure. Determining the structure ("discovery"). Order estimation as a particular case of discovery, and of model selection. (Discovery may need its own notebook soon.)

Markov models naturally form exponential families; what classes of partially-observable Markov processes also do?

    Recommended (big picture):
  • Patrick Billingsley, Statistical Inference for Markov Chains [Foundational.]
  • Olivier Cappé Eric Moulines and Tobias Ryden, Inference in Hidden Markov Models [This is a superb book, treating all the main statistical problems connected with HMMs with both clarity and rigor. There is a chapter/appendix which reminds the reader who has forgotten about Markov chains on general state spaces, but remembers measure-theoretic probability very well, about their properties. (The idea that the reader of a book on HMMs may not know what the Viterbi algorithm is, but will of course recall the Hahn-Jordan decomposition, strikes me as very much a product of the French school of probability theory --- from which I have learned much! But it's not so far gone as to make the whole thing an exercise in the "general theory of processes".)]
  • Andrew Fraser, Hidden Markov Models and Dynamical Systems [Review: The Statistics of Moving Shadows]
  • Peter Guttorp, Stochastic Modelling of Scientific Data
  • Kevin McGoff, Sayan Mukherjee, and Natesh Pillai, "Statistical inference for dynamical systems: A review", Statistics Surveys 9 (2015): 209--252, arxiv:1204.6265
    Recommended (close-ups):
  • Animashree Anandkumar, Daniel Hsu, Sham M. Kakade, "A Method of Moments for Mixture Models and Hidden Markov Models", arxiv:1203.0683
  • Peter J. Bickel and Ya'acov Ritov, "Inference in hidden Markov models I: Local asymptotic normality in the stationary case", Bernoulli 2 (1996): 199--228 [Not sure if Part II ever appeared...]
  • David Blackwell and Lambert Koopmans, "On the Identifiability Problem for Functions of Finite Markov Chains", The Annals of Mathematical Statistics 28 (1957): 1011--1015 [An old, but very clear, paper on the problems presented by what we now call hidden Markov models]
  • Peter Bühlmann and Abraham J. Wyner, "Variable Length Markov Chains", The Annals of Statistics 27 (1999): 480--513 [Preprint available as Berkeley statistics department technical report 479]
  • Olivier Cappé "Online EM Algorithm for Hidden Markov Models", Journal of Computational and Graphical Statistics 20 (2011): 728--749, arxiv:0908.2359
  • George Cybenko and Valentino Crespi, "Learning Hidden Markov Models Using Nonnegative Matrix Factorization", IEEE Transactions on Information Theory 57 (2011): 3963--3970, arxiv:0809.4086 [Though it contains an error, at least in the preprint version, about the capacities of our CSSR algorithm --- we can get model structures right with much less data than they think, though we presented examples using more data than was strictly needed.]
  • Randal Douc, Eric Moulines, Jimmy Olsson, and Ramon van Handel, "Consistency of the maximum likelihood estimator for general hidden Markov models", Annals of Statistics 39 (2011): 474--513
  • P. Dupont, F. Denis and Y. Esposito, "Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms", Pattern Recognition 38 (2005): 1349--1371
  • Julian Feldman and Joe F. Hanna", "The Structure of Responses to a Sequence of Binary Events", Journal of Mathematical Psychology 3 (1966): 371--387
  • J. D. Foulkes, "A Class of Machines which Determine the Statistical Structure of a Sequence of Characters", pp. 66-73 of vol. 4 of the Western Electronics [Wescon] Convention Record, 1959 (Institute of Radio Engineers) [PDF]
  • Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "A sticky HDP-HMM with application to speaker diarization", Annals of Applied Statistics 5 (2011): 1020--1056, arxiv:0905.2592
  • Sandro Gallo and Florencia Leonardi, "Nonparametric statistical inference for the context tree of a stationary ergodic process", Electronic Journal of Statistics 9 (2015): 2076--2098, arxiv:1411.7650
  • Subhashis Ghosal and Yongqiang Tang, "Bayesian Consistency for Markov Processes", Sankhya 68 (2006): 227--239
  • Subhashis Ghosal and Aad van der Vaart, "Convergence Rates of Posterior Distributions for Non-IID Observations", Annals of Statistics 35 (2007): 192--223, arxiv:0708.0491
  • Christopher C. Heyde, Quasi-Likelihood and Its Applications: A General Approach to Optimal Parameter Estimation
  • J. D. Kalbfleisch and J. F. Lawless, "Least-Squares Estimation of Transition Probabilities from Aggregate Data", The Canadian Journal of Statistics / La Revue Canadienne de Statistique 12 (1984): 169--182 [JSTOR]
  • Mladen Kolar, Le Song, Amr Ahmed, Eric P. Xing, "Estimating time-varying networks", Annals of Applied Statistics 4 (2010): 94--123, arxiv:0812.5087
  • Kevin McGoff, Andrew B. Nobel, "Empirical risk minimization and complexity of dynamical models", Annals of Statistics 48 (2020): 2031--2054, arxiv:1611.06173
  • Gusztav Morvai and Benjamin Weiss
  • Ioannis Papageorgiou, Ioannis Kontoyiannis, "Context-tree weighting for real-valued time series: Bayesian inference with hierarchical mixture models", arxiv:2106.03023
  • F. Papangelou, "Large Deviations and the Bayesian Estimation of Higher-Order Markov Transition Functions ", Journal of Applied Probability 33 (1996): 18--27 [JSTOR]
  • Yuval Peres and Paul Shields, "Two new Markov order estimators", math.ST/0506080
  • David Pfau, Nicholas Bartlett and Frank Wood, "Probabilistic Deterministic Infinite Automata", NIPS 23 (2010) [PDF. Really, mixtures over finite automata with arbitrarily many states.]
  • George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics [Asymptotic theory of approximation, estimation and testing, for discrete-time Markov processes on fairly general state-spaces. Mini-review]
  • Christopher C. Strelioff, James P. Crutchfield, Alfred W. Hubler, "Inferring Markov Chains: Bayesian Estimation, Model Comparison, Entropy Rate, and Out-of-class Modeling", math.ST/0703715
  • Daniel R. Upper, Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models [Ph.D. thesis, math dept., Berkeley, 1997; online]
    Not quite recommended:
  • E. Racca, F. Laio, D. Poggi and L. Ridolfi, "Test to determine the Markov order of a time series", Physical Review E 75 (2007): 011126 [The test is to linearly regress x(t+1) on x(t), x(t-1), etc., out to some finite order, and see how far back you have to go before the regression coefficients are insignificantly different from zero. This is not crazy as a first cut idea, but it's not generally valid, and in fact fails for the logistic map.]
    Definitely not recommended:
  • S. S. Melnyk, O. V. Usatenko, V. A. Yampol'skii and V. A. Golick, "Competition between Two Kinds of Correlations in Literary Texts", physics/0402042 [This has surely got to be one of the ugliest ways of parameterizing a Markov chain I have seen; it's a miracle they don't get probabilities greater than 1, if indeed they don't.]
    Modesty forbids me to recommend:
  • CRS, "Dynamics of Bayesian Updating with Dependent Data and Misspecified Models", arxiv:0901.1342
  • CRS and Kristina Lisa Klinkner, "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", cs.LG/0406011 [CSSR]
  • The relevant parts of my course on "Data over Space and Time", which were lectures 19 (for Markov models) and 23 (for hidden-Markov models) in the 2020 iteration.
    To read:
  • Roberto C. Alamino, Nestor Caticha, "Online Learning in Discrete Hidden Markov Models", arxiv:0708.2377
  • Armen Allahverdyan, Aram Galstyan, "On Maximum a Posteriori Estimation of Hidden Markov Processes", UAI 2009, arxiv:0906.1980
  • Enrique E. Alvarez, "Estimation in Stationary Markov Renewal Processes, with Application to Earthquake Forecasting in Turkey", Methodology and Computing in Applied Probability 7 (2005): 119--130
  • Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283
  • Sofia Andersson and Tobias Rydén, "Subspace estimation and prediction methods for hidden Markov models", Annals of Statistics 37 (2009): 4131--4152
  • Ana Arribas-Gil, Elisabeth Gassiat and Catherine Matias, "Parameter estimation in pair hidden Markov models", math.ST/0509280
  • Yves F. Atchade, "Estimation of Network structures from partially observed Markov random fields", arxiv:1108.2835
  • A. R. Baigorri, C. R. Goncalves, P. A. A. Resende, "Markov Chain Order Estimation and Relative Entropy", arxiv:0910.0264
  • Alexandre Belloni, Roberto I. Oliveira, "Approximate group context tree: applications to dynamic programming and dynamic choice models", arxiv:1107.0312
  • Patrice Bertail and Stéphan Clémençon, "Edgeworth expansions of suitably normalized sample mean statistics for atomic Markov chains", Probability Theory and Related Fields 130 (2004): 388--414 [I need to learn more about Edgeworth expansions anyway]
  • Byron Boots, "Spectral Approaches to Learning Predictive Representations" [Thesis proposal, CMU, 2011]
  • Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon, "Closing the Learning-Planning Loop with Predictive State Representations", arxiv:0912.2385
  • Jose Borges and Mark Levene, "Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions", cs.AI/0606115
  • J. Borwanker, G. Kallianpur and B. L. S. Prakasa Rao, "The Bernstein-von Mises Theorem for Markov Processes", The Annals of Mathematical Statistics 42 (1971): 1241--1253 [JSTOR]
  • Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics 3 (2009): 319--348, arxiv:0802.0021
  • Prafulla Chandra, Andrew Thangaraj, Nived Rajaraman, "How good is Good-Turing for Markov samples?", arxiv:2102.01938
  • Zhiyi Chi, "Effects of statistical dependence on multiple testing under a hidden Markov model", Annals of Statistics 39 (2011): 439--473
  • Pavel Chigansky, "Maximum Likelihood Estimator for Hidden Markov Models in continuous time", Statistical Inference for Stochastic Processes 12 (2009): 139--163, arxiv:0707.0271
  • Antonio Anastasio Bruto da Costa, Pallab Dasgupta, "Learning Temporal Causal Sequence Relationships from Real-Time Time-Series", arxiv:1905.12262
  • Christos Dimitrakakis, "Contex models on sequences of covers", arxiv:1005.2263
  • C. C. Y. Dorea and L. C. Zhao, "Nonparametric Density Estimation in Hidden Markov Models", Statistical Inference for Stochastic Processes 5 (2002): 55--64
  • Michael DelRose, Christian Wagner, Philip Frederick, "Evidence Feed Forward Hidden Markov Model: A New Type of Hidden Markov Model", arxiv:1102.0899 [Looks like a "to be shot after a fair trial" paper from the abstract, but then, it deserves a fair trial]
  • Randal Douc, "Non singularity of the asymptotic Fisher information matrix in hidden Markov models", math.ST/0511631
  • Randal Douc and Eric Moulines, "Asymptotic properties of the maximum likelihood estimation in misspecified Hidden Markov models", arxiv:1110.0356
  • Thierry Dumont, "Context Tree Estimation in Variable Length Hidden Markov Models", arxiv:1109.0392
  • Farzad Eskandari and Mohammad R. Meshkani, "Empirical Bayes analysis of log-linear models for a generalized finite stationary Markov chain", Metrika 59 (2004): 173--191 [abstract]
  • Florence Forbes and Nathalie Peyrard, "Hidden Markov Random Field Model Selection Criteria Based on Mean Field-Like Approximations", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1089--1101 [PostScript preprint]
  • Scott D. Foster and Mark V. Bravington, "Graphical Diagnostics for Markov Models for Categorical Data", Journal of Computational and Graphical Statistics forthcoming (2011)
  • Halina Frydman and Peter Lakner, "Maximum likelihood estimation of hidden Markov processes", Annals of Applied Probability 13 (2003): 1296--1312
  • Cheng-Der Fuh, "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models", Annals of Statistics 32 (2004): 2305--2339 = math.ST/0503682
  • Cheng-Der Fuh and Inchi Hu, "Estimation in hidden Markov models via efficient importance sampling", Bernoulli 13 (2007): 492--513, arxiv:0708.4152
  • Antonio Galves, Charlotte Galves, Nancy L. Garcia, Florencia Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts", arxiv:0902.3619
  • Antonio Galves, Aurélien Garivier, Elisabeth Gassiat, "Joint estimation of intersecting context tree models", arxiv:1102.0673
  • Antonio Galves, Florencia Leonardi, "Exponential inequalities for empirical unbounded context trees", arxiv:0710.5900
  • Nancy L. Garcia, Lucas Moreira, "Stochastically Perturbed Chains of Variable Memory", arxiv:1305.5747
  • Steven T. Garren and Richard L. Smith, "Estimating the second largest eigenvalue of a Markov transition matrix", Bernoulli 6 (2000): 215--242
  • Hugo Harari-Kermadec, "Regenerative block empirical likelihood for Markov chains", arxiv:1102.3107
  • Daniel Hsu, Sham M. Kakade, Tong Zhang, "A Spectral Algorithm for Learning Hidden Markov Models", arxiv:0811.4413
  • Masashi Inoue and Naonori Ueda, "Exploitation of Unlabeled Sequences in Hidden Markov Models", IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2003): 1570--1581
  • H. Ito, S.-I. Amari and K. Kobayashi, "Identifiability of Hidden Markov Information Sources and Their Minimum Degrees of Freedom", IEEE Transactions on Information Theory 38 (1992): 324--333
  • Dezhe Z. Jin, Alexay A. Kozhevnikov, "A compact statistical model of the song syntax in Bengalese finch", arxiv:1011.2998
  • Hans R. Künsch, "State Space and Hidden Markov Models", pp. 109--173 in Ole E. Barndorff-Nielsen, David R. Cox and Claudia Klüppelberg (eds.), Complex Stochastic Systems
  • J. Lember, A. Koloydenko, "Adjusted Viterbi training for hidden Markov models", arxiv:0709.2317
  • Florencia Leonardi, "Some upper bounds for the rate of convergence of penalized likelihood context tree estimators", arxiv:0701810
  • E. Locherbach, "Likelihood Ratio Processes for Markovian Particle Systems with Killing and Jumps", Statistical Inference for Stochastic Processes 5 (2002): 153--177
  • Claudio Macci, "Large Deviations for Empirical Estimators of the Stationary Distribution of a Semi-Markov Process with Finite State Space", Communications in Statistics: Theory and Methods 37 (2008): 3077--3089
  • A. Martin, Gadiel Seroussi, and Marcelo J. Weinberger, "Type Classes of Context Trees", IEEE Transactions on Information Theory 58 (2012): 4077--4093
  • Kevin McGoff, Sayan Mukherjee, Andrew Nobel, "Gibbs posterior convergence and the thermodynamic formalism", arxiv:1901.08641
  • Kevin McGoff, Sayan Mukherjee, Andrew Nobel, Natesh Pillai, "Consistency of maximum likelihood estimation for some dynamical systems", Annals of Statistics 43 (2015): 1--29, arxiv:1306.5603
  • Kevin McGoff, Andrew B. Nobel, "Variational analysis of inference from dynamical systems", arxiv:1601.05033
  • Philipp Metzner, Frank Noe and Christof Schutte, "Estimating the sampling error: Distribution of transition matrices and functions of transition matrices for given trajectory data", Physical Review E 80 (2009): 021106 [I presume they have a good reason for not just using the delta method, and/or bootstrapping, but I'll have to read it to see what that is]
  • Elchanan Mossel, Sébastien Roch, "Learning nonsingular phylogenies and hidden Markov models", Annals of Applied Probability 16 (2006): 583--614, arxiv:cs.LG/0502076
  • Ursula U. Müller, Anton Schick, Wolfgang Wefelmeyer, "Blockwise Empirical Likelihood and Efficiency for Markov Chains", Journal of Time Series Analysis forthcoming (2025+)
  • Michael H. Neumann, Efstathios Paparoditis, "Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations", Bernoulli 14 (2008): 14--46, arxiv:0803.0835
  • Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag, "Adaptive Context Tree Weighting", arxiv:1201.2056
  • Roberto Imbuzeiro Oliveira, "Stochastic processes with random contexts: a characterization, and adaptive estimators for the transition probabilities", arxiv:1309.2819
  • Enza Orlandi, Eva Loecherbach, "On the neighborhood radius estimation in Variable-neighborhood Markov Random Fields", arxiv:1002.4850
  • Adam Paszkiewicz, "When transition count for a Markov chains is a complete sufficient statistic", Statistics and Probability Letters 76 (2006): 757--763 [When the initial and final states are fixed, for example]
  • Spiridon Penev, Hanxiang Peng, Atnon Schick and Wolfgang Wefelmeyer, "Efficient estimators for functionals of Markov chains with parametric marginals", Statistics and Probability Letters 66 (2004): 335--345
  • Ricardo Ríos, Luis-Angel Rodríguez, "Penalized estimate of the number of states in Gaussian linear AR with Markov regime", Electronic Journal of Statistics 2 (2008): 1111--1128, arxiv:0807.2726
  • Amr Sadek and Nikolaos Limnios, "Nonparametric estimation of reliability and survival function for continuous-time finite Markov processes", Journal of Statistical Planning and Inference 133 (2005): 1--21
  • Manuel S. Santos, "Consistency properties of a simulation-based estimator for dynamic processes", Annals of Applied Probability 20 (2010): 196--213
  • Anton Schick and Wolfgang Wefelmeyer, "Estimating Joint Distributions of Markov Chains", Statistical Inference for Stochastic Processes 5 (2002): 1--22
  • Peter Sunehag, Marcus Hutter, "Consistency of Feature Markov Processes", arxiv:1007.2075
  • V. B. Tadic, "Analyticity, Convergence, and Convergence Rate of Recursive Maximum-Likelihood Estimation in Hidden Markov Models", IEEE Transactions on Information Theory 56 (2010): 6406--6432, arxiv:0904.4264
  • Zsolt Talata and Tyrone E. Duncan, "BIC Context Tree Estimation for Stationary Ergodic Processes", IEEE Transactions on Information Theory (2011): 3877--3886
  • Iuliana Teodorescu, "Maximum Likelihood Estimation for Markov Chains", arxiv:0905.4131
  • M. J. van der Heyden et al., "Testing the Order of Discrete Markov Chains Using Surrogate Data", Physica D 117 (1998): 299--313
  • Ramon van Hanel, "On the minimal penalty for Markov order estimation", Probability Theory and Related Fields 150 (2011): 709--738, arxiv:0908.3666
  • Joel Veness, Kee Siong Ng, Marcus Hutter, Michael Bowling, "Context Tree Switching", arxiv:1111.3182
  • Elodie Vernet, "Posterior consistency for nonparametric Hidden Markov Models with finite state space", arxiv:1311.3092
  • Martin J. Wainwright, "Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited setting", cs.LG/0602092
  • Christian H. Weiss, "Rule generation for categorical time series with Markov assumptions", Statistics and Computing 21 (2011): 1--16 [VLMMs]
  • James Zhao, "Hidden Markov Models with Multiple Observation Processes", arxiv:1010.1042
  • L. C. Zhao, C. C. Y. Dorea and C. R. Gonçalves, "On Determination of the Order of a Markov Chain", Statistical Inference for Stochastic Processes 4 (2001): 273--282
  • Ming-Jie Zhao and Herbert Jaeger, "Making the Error-Controlling Algorithm of Observable Operator Models Constructive", Neural Computation 21 (2009): 3460--3486


permanent link for this note RSS feed for this note

Notebooks :

AltStyle によって変換されたページ (->オリジナル) /