Block matrix pseudoinverse

Formula for the pseudoinverse of a partitioned matrix

This article needs additional citations for verification . Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Block matrix pseudoinverse" – news · newspapers · books · scholar · JSTOR (December 2010) (Learn how and when to remove this message)

In mathematics, a block matrix pseudoinverse is a formula for the pseudoinverse of a partitioned matrix. This is useful for decomposing or approximating many algorithms updating parameters in signal processing, which are based on the least squares method.

Derivation

[edit ]

Consider a column-wise partitioned matrix:

{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.

{\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.}

If the above matrix is full column rank, the Moore–Penrose inverse matrices of it and its transpose are

{\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}

{\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}}

This computation of the pseudoinverse requires (n + p)-square matrix inversion and does not take advantage of the block form.

To reduce computational costs to n- and p-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives ^[1]

{\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}

{\displaystyle {\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}}

where orthogonal projection matrices are defined by

{\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}

{\displaystyle {\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}}

The above formulas are not necessarily valid if ${\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}$ {\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}} does not have full rank – for example, if $\mathbf {A} \neq 0$ {\displaystyle \mathbf {A} \neq 0}, then

{\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0

{\displaystyle {\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0}

Application to least squares problems

[edit ]

Given the same matrices as above, we consider the following least squares problems, which appear as multiple objective optimizations or constrained problems in signal processing. Eventually, we can implement a parallel algorithm for least squares based on the following results.

Column-wise partitioning in over-determined least squares

[edit ]

Suppose a solution $\mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}$ {\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}} solves an over-determined system:

{\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.

{\displaystyle {\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.}

Using the block matrix pseudoinverse, we have

\mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+},円\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .

{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+},円\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .}

Therefore, we have a decomposed solution:

\mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+},円\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+},円\mathbf {d} .

{\displaystyle \mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+},円\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+},円\mathbf {d} .}

Row-wise partitioning in under-determined least squares

[edit ]

Suppose a solution $\mathbf {x}$ {\displaystyle \mathbf {x} } solves an under-determined system:

{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.

{\displaystyle {\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.}

The minimum-norm solution is given by

\mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+},円{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.

{\displaystyle \mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+},円{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.}

Using the block matrix pseudoinverse, we have

\mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+},円\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+},円\mathbf {f} .

{\displaystyle \mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+},円\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+},円\mathbf {f} .}

Comments on matrix inversion

[edit ]

Instead of $\mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}$ {\displaystyle \mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}}, we need to calculate directly or indirectly^{[citation needed ]}^{[original research? ]}

\left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.

{\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.}

In a dense and small system, we can use singular value decomposition, QR decomposition, or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.

Considering parallel algorithms, we can compute $\left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}$ {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}} and $\left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}$ {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}} in parallel. Then, we finish to compute $\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}$ {\displaystyle \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}} and $\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}$ {\displaystyle \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}} also in parallel.

References

[edit ]

^ J.K. Baksalary and O.M. Baksalary (2007). "Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix". Linear Algebra Appl. 421: 16–23. doi:10.1016/j.laa.2006年03月03日1.

External links

[edit ]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software

Retrieved from "https://en.wikipedia.org/w/index.php?title=Block_matrix_pseudoinverse&oldid=1255262436"

Derivation

Application to least squares problems

Column-wise partitioning in over-determined least squares

Row-wise partitioning in under-determined least squares

Comments on matrix inversion

See also

References

External links