Scoring algorithm

Scoring algorithm, also known as Fisher's scoring,^[1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation

[edit ]

Let $Y_{1},\ldots ,Y_{n}$ {\displaystyle Y_{1},\ldots ,Y_{n}} be random variables, independent and identically distributed with twice differentiable p.d.f. $f(y;\theta )$ {\displaystyle f(y;\theta )}, and we wish to calculate the maximum likelihood estimator (M.L.E.) $\theta ^{*}$ {\displaystyle \theta ^{*}} of $\theta$ {\displaystyle \theta }. First, suppose we have a starting point for our algorithm $\theta _{0}$ {\displaystyle \theta _{0}}, and consider a Taylor expansion of the score function, $V(\theta )$ {\displaystyle V(\theta )}, about $\theta _{0}$ {\displaystyle \theta _{0}}:

V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),,円

{\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),,円}

where

{\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )

{\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}

is the observed information matrix at $\theta _{0}$ {\displaystyle \theta _{0}}. Now, setting $\theta =\theta ^{*}$ {\displaystyle \theta =\theta ^{*}}, using that $V(\theta ^{*})=0$ {\displaystyle V(\theta ^{*})=0} and rearranging gives us:

\theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).,円

{\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).,円}

We therefore use the algorithm

\theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),,円

{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),,円}

and under certain regularity conditions, it can be shown that $\theta _{m}\rightarrow \theta ^{*}$ {\displaystyle \theta _{m}\rightarrow \theta ^{*}}.

Fisher scoring

[edit ]

In practice, ${\mathcal {J}}(\theta )$ {\displaystyle {\mathcal {J}}(\theta )} is usually replaced by ${\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]$ {\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]}, the Fisher information, thus giving us the Fisher Scoring Algorithm:

\theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})

{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})}..

Under some regularity conditions, if $\theta _{m}$ {\displaystyle \theta _{m}} is a consistent estimator, then $\theta _{m+1}$ {\displaystyle \theta _{m+1}} (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.^[2]

References

[edit ]

^ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika. 74 (4): 817–827. doi:10.1093/biomet/74.4.817.
^ Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference" , Springer Texts in Statistics, New York, NY: Springer New York, Theorem 9.4, doi:10.1007/978-1-4939-9761-9_6, ISBN 978-1-4939-9759-6, S2CID 239322258 , retrieved 2023年01月03日

Jennrich, R. I. & Sampson, P. F. (1976). "Newton-Raphson and Related Algorithms for Maximum Likelihood Variance Component Estimation". Technometrics . 18 (1): 11–17. doi:10.1080/00401706.1976.10489395 (inactive 12 July 2025). JSTOR 1267911.{{cite journal}}: CS1 maint: DOI inactive as of July 2025 (link)

v
t
e

Optimization: Algorithms, methods, and heuristics

Unconstrained nonlinear

Functions

Gradients

Convergence	Trust region Wolfe conditions
Quasi–Newton	Berndt–Hall–Hall–Hausman Broyden–Fletcher–Goldfarb–Shanno and L-BFGS Davidon–Fletcher–Powell Symmetric rank-one (SR1)
Other methods	Conjugate gradient Gauss–Newton Gradient Mirror Levenberg–Marquardt Powell's dog leg method Truncated Newton

Hessians

Newton's method

Graph of a strictly concave quadratic function with unique maximum.

Optimization computes maxima and minima.

Constrained nonlinear

General	Barrier methods Penalty methods
Differentiable	Augmented Lagrangian methods Sequential quadratic programming Successive linear programming

Convex optimization

Convex
minimization

Linear and
quadratic

Interior point	Affine scaling Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar
Basis- exchange	Simplex algorithm of Dantzig Revised simplex algorithm Criss-cross algorithm Principal pivoting algorithm of Lemke Active-set method