Jump to content
Wikipedia The Free Encyclopedia

Davidon–Fletcher–Powell formula

From Wikipedia, the free encyclopedia

The Davidon–Fletcher–Powell formula (or DFP; named after William C. Davidon, Roger Fletcher, and Michael J. D. Powell) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix.

Given a smooth function f ( x ) {\displaystyle f(x)} {\displaystyle f(x)}, its gradient ( f {\displaystyle \nabla f} {\displaystyle \nabla f}), and positive-definite Hessian matrix B {\displaystyle B} {\displaystyle B}, the Taylor series is

f ( x k + s k ) = f ( x k ) + f ( x k ) T s k + 1 2 s k T B s k + , {\displaystyle f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,} {\displaystyle f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,}

and the Taylor series of the gradient itself (secant equation)

f ( x k + s k ) = f ( x k ) + B s k + {\displaystyle \nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots } {\displaystyle \nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots }

is used to update B {\displaystyle B} {\displaystyle B}.

The DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value of B k {\displaystyle B_{k}} {\displaystyle B_{k}}:

B k + 1 = ( I γ k y k s k T ) B k ( I γ k s k y k T ) + γ k y k y k T , {\displaystyle B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},} {\displaystyle B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},}

where

y k = f ( x k + s k ) f ( x k ) , {\displaystyle y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),} {\displaystyle y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),}
γ k = 1 y k T s k , {\displaystyle \gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},} {\displaystyle \gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},}

and B k {\displaystyle B_{k}} {\displaystyle B_{k}} is a symmetric and positive-definite matrix.

The corresponding update to the inverse Hessian approximation H k = B k 1 {\displaystyle H_{k}=B_{k}^{-1}} {\displaystyle H_{k}=B_{k}^{-1}} is given by

H k + 1 = H k H k y k y k T H k y k T H k y k + s k s k T y k T s k . {\displaystyle H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.} {\displaystyle H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.}

B {\displaystyle B} {\displaystyle B} is assumed to be positive-definite, and the vectors s k T {\displaystyle s_{k}^{T}} {\displaystyle s_{k}^{T}} and y {\displaystyle y} {\displaystyle y} must satisfy the curvature condition

s k T y k = s k T B s k > 0. {\displaystyle s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.} {\displaystyle s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.}

The DFP formula is quite effective, but it was soon superseded by the Broyden–Fletcher–Goldfarb–Shanno formula, which is its dual (interchanging the roles of y and s).[1]

Compact representation

[edit ]

By unwinding the matrix recurrence for B k {\displaystyle B_{k}} {\displaystyle B_{k}}, the DFP formula can be expressed as a compact matrix representation. Specifically, defining

S k = [ s 0 s 1 s k 1 ] , {\displaystyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},} {\displaystyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},} Y k = [ y 0 y 1 y k 1 ] , {\displaystyle Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},} {\displaystyle Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},}

and upper triangular and diagonal matrices

( R k ) i j := ( R k SY ) i j = s i 1 T y j 1 , ( R k YS ) i j = y i 1 T s j 1 , ( D k ) i i := ( D k SY ) i i = s i 1 T y i 1  for  1 i j k {\displaystyle {\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k} {\displaystyle {\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k}

the DFP matrix has the equivalent formula

B k = B 0 + J k N k 1 J k T , {\displaystyle B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},} {\displaystyle B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},}

J k = [ Y k Y k B 0 S k ] {\displaystyle J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}} {\displaystyle J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}}

N k = [ 0 k × k R k YS ( R k YS ) T R k + R k T ( D k + S k T B 0 S k ) ] {\displaystyle N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}} {\displaystyle N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}}

The inverse compact representation can be found by applying the Sherman-Morrison-Woodbury inverse to B k {\displaystyle B_{k}} {\displaystyle B_{k}}. The compact representation is particularly useful for limited-memory and constrained problems.[2]

See also

[edit ]

References

[edit ]
  1. ^ Avriel, Mordecai (1976). Nonlinear Programming: Analysis and Methods. Prentice-Hall. pp. 352–353. ISBN 0-13-623603-0.
  2. ^ Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting". arXiv:2403.12206 [math.OC].

Further reading

[edit ]
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis- exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows

AltStyle によって変換されたページ (->オリジナル) /