Integrated nested Laplace approximations
Part of a series on |
Bayesian statistics |
---|
Posterior = Likelihood ×ばつ Prior ÷ Evidence |
Background |
Model building |
Posterior approximation |
Estimators |
Evidence approximation |
Model evaluation |
Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's approximation.[1] It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.[2] [3] [4] Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.[5] [6] [7] It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.[8] [9] The INLA method is implemented in the R-INLA R package.[10]
Latent Gaussian models
[edit ]Let {\displaystyle {\boldsymbol {y}}=(y_{1},\dots ,y_{n})} denote the response variable (that is, the observations) which belongs to an exponential family, with the mean {\displaystyle \mu _{i}} (of {\displaystyle y_{i}}) being linked to a linear predictor {\displaystyle \eta _{i}} via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector {\displaystyle {\boldsymbol {x}}}. The hyperparameters of the model are denoted by {\displaystyle {\boldsymbol {\theta }}}. As per Bayesian statistics, {\displaystyle {\boldsymbol {x}}} and {\displaystyle {\boldsymbol {\theta }}} are random variables with prior distributions.
The observations are assumed to be conditionally independent given {\displaystyle {\boldsymbol {x}}} and {\displaystyle {\boldsymbol {\theta }}}: {\displaystyle \pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})=\prod _{i\in {\mathcal {I}}}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }}),} where {\displaystyle {\mathcal {I}}} is the set of indices for observed elements of {\displaystyle {\boldsymbol {y}}} (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor {\displaystyle {\boldsymbol {\eta }}} is part of {\displaystyle {\boldsymbol {x}}}.
For the model to be a latent Gaussian model, it is assumed that {\displaystyle {\boldsymbol {x}}|{\boldsymbol {\theta }}} is a Gaussian Markov Random Field (GMRF)[1] (that is, a multivariate Gaussian with additional conditional independence properties) with probability density {\displaystyle \pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\propto \left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}\right),} where {\displaystyle {\boldsymbol {Q_{\theta }}}} is a {\displaystyle {\boldsymbol {\theta }}}-dependent sparse precision matrix and {\displaystyle \left|{\boldsymbol {Q_{\theta }}}\right|} is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution {\displaystyle \pi ({\boldsymbol {\theta }})} for the hyperparameters need not be Gaussian. However, the number of hyperparameters, {\displaystyle m=\mathrm {dim} ({\boldsymbol {\theta }})}, is assumed to be small (say, less than 15).
Approximate Bayesian inference with INLA
[edit ]In Bayesian inference, one wants to solve for the posterior distribution of the latent variables {\displaystyle {\boldsymbol {x}}} and {\displaystyle {\boldsymbol {\theta }}}. Applying Bayes' theorem {\displaystyle \pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\pi ({\boldsymbol {\theta }})}{\pi ({\boldsymbol {y}})}},} the joint posterior distribution of {\displaystyle {\boldsymbol {x}}} and {\displaystyle {\boldsymbol {\theta }}} is given by {\displaystyle {\begin{aligned}\pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})&\propto \pi ({\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\prod _{i}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }})\\&\propto \pi ({\boldsymbol {\theta }})\left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}+\sum _{i}\log \left[\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }})\right]\right).\end{aligned}}} Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals {\displaystyle {\begin{array}{rcl}\pi (x_{i}|{\boldsymbol {y}})&=&\int \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})\pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\\pi (\theta _{j}|{\boldsymbol {y}})&=&\int \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where {\displaystyle {\boldsymbol {\theta }}_{-j}=\left(\theta _{1},\dots ,\theta _{j-1},\theta _{j+1},\dots ,\theta _{m}\right)}.
A key idea of INLA is to construct nested approximations given by {\displaystyle {\begin{array}{rcl}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}(x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}}){\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\{\widetilde {\pi }}(\theta _{j}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where {\displaystyle {\widetilde {\pi }}(\cdot |\cdot )} is an approximated posterior density. The approximation to the marginal density {\displaystyle \pi (x_{i}|{\boldsymbol {y}})} is obtained in a nested fashion by first approximating {\displaystyle \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})} and {\displaystyle \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})}, and then numerically integrating out {\displaystyle {\boldsymbol {\theta }}} as {\displaystyle {\begin{aligned}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})=\sum _{k}{\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)\times {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})\times \Delta _{k},\end{aligned}}} where the summation is over the values of {\displaystyle {\boldsymbol {\theta }}}, with integration weights given by {\displaystyle \Delta _{k}}. The approximation of {\displaystyle \pi (\theta _{j}|{\boldsymbol {y}})} is computed by numerically integrating {\displaystyle {\boldsymbol {\theta }}_{-j}} out from {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})}.
To get the approximate distribution {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})}, one can use the relation {\displaystyle {\begin{aligned}{\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }},{\boldsymbol {y}}\right)}{\pi \left({\boldsymbol {x}}|{\boldsymbol {\theta }},{\boldsymbol {y}}\right)\pi ({\boldsymbol {y}})}},\end{aligned}}} as the starting point. Then {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} is obtained at a specific value of the hyperparameters {\displaystyle {\boldsymbol {\theta }}={\boldsymbol {\theta }}_{k}} with Laplace's approximation [1] {\displaystyle {\begin{aligned}{\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})&\propto \left.{\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\\&\propto \left.{\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {\theta }}_{k})}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\end{aligned}}} where {\displaystyle {\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is the Gaussian approximation to {\displaystyle {\pi }\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} whose mode at a given {\displaystyle {\boldsymbol {\theta }}_{k}} is {\displaystyle {\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})}. The mode can be found numerically for example with the Newton-Raphson method.
The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of {\displaystyle {\boldsymbol {x}}} in the denominator since it is usually close to a Gaussian due to the GMRF property of {\displaystyle {\boldsymbol {x}}}. Applying the approximation here improves the accuracy of the method, since the posterior {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})} itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})}. The second important property of a GMRF, the sparsity of the precision matrix {\displaystyle {\boldsymbol {Q}}_{{\boldsymbol {\theta }}_{k}}}, is required for efficient computation of {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})} for each value {\displaystyle {{\boldsymbol {\theta }}_{k}}}.[1]
Obtaining the approximate distribution {\displaystyle {\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation.[1] For the numerical integration to obtain {\displaystyle {\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})}, also three options are available: grid search, central composite design, or empirical Bayes.[1]
References
[edit ]- ^ a b c d e f Rue, Håvard; Martino, Sara; Chopin, Nicolas (2009). "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations". J. R. Statist. Soc. B. 71 (2): 319–392. doi:10.1111/j.1467-9868.2008.00700.x. hdl:2066/75507 . S2CID 1657669.
- ^ Taylor, Benjamin M.; Diggle, Peter J. (2014). "INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes". Journal of Statistical Computation and Simulation. 84 (10): 2266–2284. arXiv:1202.1738 . doi:10.1080/00949655.2013.788653. S2CID 88511801.
- ^ Teng, M.; Nathoo, F.; Johnson, T. D. (2017). "Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods". Journal of Statistical Computation and Simulation. 87 (11): 2227–2252. doi:10.1080/00949655.2017.1326117. PMC 5708893 . PMID 29200537.
- ^ Wang, Xiaofeng; Yue, Yu Ryan; Faraway, Julian J. (2018). Bayesian Regression Modeling with INLA. Chapman and Hall/CRC. ISBN 9781498727259.
- ^ Blangiardo, Marta; Cameletti, Michela (2015). Spatial and Spatio-temporal Bayesian Models with R-INLA. John Wiley & Sons, Ltd. ISBN 9781118326558.
- ^ Opitz, T. (2017). "Latent Gaussian modeling and INLA: A review with focus on space-time applications". Journal de la Société Française de Statistique. 158: 62–85. arXiv:1708.02723 .
- ^ Moraga, Paula (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman and Hall/CRC. ISBN 9780367357955.
- ^ Lindgren, Finn; Rue, Håvard; Lindström, Johan (2011). "An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach". J. R. Statist. Soc. B. 73 (4): 423–498. doi:10.1111/j.1467-9868.2011.00777.x. hdl:20.500.11820/1084d335-e5b4-4867-9245-ec9c4f6f4645 . S2CID 120949984.
- ^ Lezama-Ochoa, N.; Grazia Pennino, M.; Hall, M. A.; Lopez, J.; Murua, H. (2020). "Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular)". Scientific Reports. 10 (1): 18822. Bibcode:2020NatSR..1018822L. doi:10.1038/s41598-020-73879-3. PMC 7606447 . PMID 33139744.
- ^ "R-INLA Project" . Retrieved 21 April 2022.
Further reading
[edit ]- Gomez-Rubio, Virgilio (2021). Bayesian inference with INLA. Chapman and Hall/CRC. ISBN 978-1-03-217453-2.