Jump to content
Wikipedia The Free Encyclopedia

Block matrix pseudoinverse

From Wikipedia, the free encyclopedia
Formula for the pseudoinverse of a partitioned matrix
This article needs additional citations for verification . Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Block matrix pseudoinverse" – news · newspapers · books · scholar · JSTOR
(December 2010) (Learn how and when to remove this message)

In mathematics, a block matrix pseudoinverse is a formula for the pseudoinverse of a partitioned matrix. This is useful for decomposing or approximating many algorithms updating parameters in signal processing, which are based on the least squares method.

Derivation

[edit ]

Consider a column-wise partitioned matrix:

[ A B ] , A R m × n , B R m × p , m n + p . {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.} {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.}

If the above matrix is full column rank, the Moore–Penrose inverse matrices of it and its transpose are

[ A B ] + = ( [ A B ] T [ A B ] ) 1 [ A B ] T , [ A T B T ] + = [ A B ] ( [ A B ] T [ A B ] ) 1 . {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}} {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}}

This computation of the pseudoinverse requires (n + p)-square matrix inversion and does not take advantage of the block form.

To reduce computational costs to n- and p-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives [1]

[ A B ] + = [ P B A ( A T P B A ) 1 P A B ( B T P A B ) 1 ] = [ ( P B A ) + ( P A B ) + ] , [ A T B T ] + = [ P B A ( A T P B A ) 1 , P A B ( B T P A B ) 1 ] = [ ( A T P B ) + ( B T P A ) + ] , {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}} {\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}}

where orthogonal projection matrices are defined by

P A = I A ( A T A ) 1 A T , P B = I B ( B T B ) 1 B T . {\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}} {\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}}

The above formulas are not necessarily valid if [ A B ] {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}} {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}} does not have full rank – for example, if A 0 {\displaystyle \mathbf {A} \neq 0} {\displaystyle \mathbf {A} \neq 0}, then

[ A A ] + = 1 2 [ A + A + ] [ ( P A A ) + ( P A A ) + ] = 0 {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0} {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0}

Application to least squares problems

[edit ]

Given the same matrices as above, we consider the following least squares problems, which appear as multiple objective optimizations or constrained problems in signal processing. Eventually, we can implement a parallel algorithm for least squares based on the following results.

Column-wise partitioning in over-determined least squares

[edit ]

Suppose a solution x = [ x 1 x 2 ] {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}} {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}} solves an over-determined system:

[ A , B ] [ x 1 x 2 ] = d , d R m × 1 . {\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.} {\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.}

Using the block matrix pseudoinverse, we have

x = [ A , B ] + d = [ ( P B A ) + ( P A B ) + ] d . {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+},円\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .} {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+},円\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .}

Therefore, we have a decomposed solution:

x 1 = ( P B A ) + d , x 2 = ( P A B ) + d . {\displaystyle \mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+},円\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+},円\mathbf {d} .} {\displaystyle \mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+},円\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+},円\mathbf {d} .}

Row-wise partitioning in under-determined least squares

[edit ]

Suppose a solution x {\displaystyle \mathbf {x} } {\displaystyle \mathbf {x} } solves an under-determined system:

[ A T B T ] x = [ e f ] , e R n × 1 , f R p × 1 . {\displaystyle {\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.} {\displaystyle {\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.}

The minimum-norm solution is given by

x = [ A T B T ] + [ e f ] . {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+},円{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.} {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+},円{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.}

Using the block matrix pseudoinverse, we have

x = [ ( A T P B ) + ( B T P A ) + ] [ e f ] = ( A T P B ) + e + ( B T P A ) + f . {\displaystyle \mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+},円\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+},円\mathbf {f} .} {\displaystyle \mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+},円\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+},円\mathbf {f} .}

Comments on matrix inversion

[edit ]

Instead of ( [ A B ] T [ A B ] ) 1 {\displaystyle \mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}} {\displaystyle \mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}}, we need to calculate directly or indirectly[citation needed ][original research? ]

( A T A ) 1 , ( B T B ) 1 , ( A T P B A ) 1 , ( B T P A B ) 1 . {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.} {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.}

In a dense and small system, we can use singular value decomposition, QR decomposition, or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.

Considering parallel algorithms, we can compute ( A T A ) 1 {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}} {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}} and ( B T B ) 1 {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}} {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}} in parallel. Then, we finish to compute ( A T P B A ) 1 {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}} {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}} and ( B T P A B ) 1 {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}} {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}} also in parallel.

See also

[edit ]

References

[edit ]
  1. ^ J.K. Baksalary and O.M. Baksalary (2007). "Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix". Linear Algebra Appl. 421: 16–23. doi:10.1016/j.laa.2006年03月03日1.
[edit ]

AltStyle によって変換されたページ (->オリジナル) /