How to do gradient descent when parameter is positive definite matrix

Question 1

So, suppose I have an objective function $\mathcal{L}(\Sigma)$ where $\Sigma$ is a positive definite matrix. Now, I want to optimize this function using gradient descent. Now, I think if I compute the gradient and write the update equation for $\Sigma$, the positive definite constraint may not be maintained. How can I be sure that the optimum I get is also at a pd matrix.

Question 2

All symmetric positive definite matrix are diagonalizable and have all eigenvalues positive.
The matrix logarithm will be a symmetric matrix.
If the matrix is size 3 for example, the logarithm will have 6 free parameters; in general $n(n+1)/2$ parameters are needed for the logarithm. Define the function to optimize over these parameters for the logarithm of the matrix, then use the matrix exponential to find the corresponding positive definite matrix you use in the objective function.

Question 3

Edit: I wrote a blogpost about this problem, see here

Elaborating on the other two answers, one way to achieve this is to not optimize directly over the matrix $\Sigma$ that is constrained to be in the space of symmetric positive definite matrices ($SPD$), but over some other parameter $\theta \in \mathbb{R}^n$ that has no constrain and that parametrizes the positive definite matrices. By "parametrizes", what we mean is that we have some function $f: \mathbb{R}^n \to SPD$, such that for every $\Sigma$ there exists a $\theta$ such that $f(\theta)=\Sigma$, and $f$ is also differentiable.

The two other answers provide different examples of parametrizations. One answer proposes using the log-Cholesky parametrization. In this case, $\theta$ refers to the non-zero entries of lower-triangular matrices $L$. We have that a lower-triangular matrix $L$ parametrizes the SPD matrices by a function $f$ which exponentiates the diagonal elements of $L$ and then multiplies the result with its transpose. Thus, you can just optimize on the entries of $L$, and at each step generate the $\Sigma$ with $f$, to compute the loss. The non-zero entries of $L$ are not constrained, so you can just use regular optimization techniques.

The second answer proposes using the matrix-exponential, which maps from the space of symmetric matrices $Sym$ to symmetric positive definite matrices $SPD$. In the space of symmetric matrices, you can just make $\theta$ the lower triangular entries again, which are unconstrained, and then at each step compute the $\Sigma$ corresponding to the symmetric matrix given by $\theta$, to compute the loss.

This approach is related to optimization on manifolds, and there exist some software tools that can do most of the heavy lifting for you. The amazing Python package geotorch will let you constrain your matrix to be positive definite with just one call, and let you forget about the details. There is also the tool of Parametrizations in Pytorch, the tutorial of which explains some of the concepts in the other answers. Notably, parametrizations were introduced to PyTorch by the author of geotorch.

Question 4

This problem occurs prominently in for instance estimation of mixed models. The R packages nlme and lme4 and certainly others use the log-Cholesky parameterization of the positive definite covariance matrix(ices).

For details see Difference between Cholesky decomposition and log-cholesky Decomposition

Some other references/links:

Parametrization of positive semidefinite matriceshttps://mathoverflow.net/questions/225152/parametrization-of-positive-semidefinite-matrices
Numerical Mathematics - Constraint or Unconstraint Optimization?
A parameterization of positive definite matrices in terms of partial correlation vines
Unconstrained Parameterizations for Variance-Covariance Matrices

John L 2,78711 silver badges23 bronze badges · Answer 1 · 2021-04-29 16:23:54Z

All symmetric positive definite matrix are diagonalizable and have all eigenvalues positive.
The matrix logarithm will be a symmetric matrix.
If the matrix is size 3 for example, the logarithm will have 6 free parameters; in general $n(n+1)/2$ parameters are needed for the logarithm. Define the function to optimize over these parameters for the logarithm of the matrix, then use the matrix exponential to find the corresponding positive definite matrix you use in the objective function.

dherrera 2,35212 silver badges41 bronze badges · Answer 2 · 2024-08-14 19:34:11Z

Edit: I wrote a blogpost about this problem, see here

Elaborating on the other two answers, one way to achieve this is to not optimize directly over the matrix $\Sigma$ that is constrained to be in the space of symmetric positive definite matrices ($SPD$), but over some other parameter $\theta \in \mathbb{R}^n$ that has no constrain and that parametrizes the positive definite matrices. By "parametrizes", what we mean is that we have some function $f: \mathbb{R}^n \to SPD$, such that for every $\Sigma$ there exists a $\theta$ such that $f(\theta)=\Sigma$, and $f$ is also differentiable.

The two other answers provide different examples of parametrizations. One answer proposes using the log-Cholesky parametrization. In this case, $\theta$ refers to the non-zero entries of lower-triangular matrices $L$. We have that a lower-triangular matrix $L$ parametrizes the SPD matrices by a function $f$ which exponentiates the diagonal elements of $L$ and then multiplies the result with its transpose. Thus, you can just optimize on the entries of $L$, and at each step generate the $\Sigma$ with $f$, to compute the loss. The non-zero entries of $L$ are not constrained, so you can just use regular optimization techniques.

The second answer proposes using the matrix-exponential, which maps from the space of symmetric matrices $Sym$ to symmetric positive definite matrices $SPD$. In the space of symmetric matrices, you can just make $\theta$ the lower triangular entries again, which are unconstrained, and then at each step compute the $\Sigma$ corresponding to the symmetric matrix given by $\theta$, to compute the loss.

This approach is related to optimization on manifolds, and there exist some software tools that can do most of the heavy lifting for you. The amazing Python package geotorch will let you constrain your matrix to be positive definite with just one call, and let you forget about the details. There is also the tool of Parametrizations in Pytorch, the tutorial of which explains some of the concepts in the other answers. Notably, parametrizations were introduced to PyTorch by the author of geotorch.

kjetil b halvorsen ♦ 85.6k32 gold badges216 silver badges694 bronze badges · Answer 3 · 2021-04-29 16:31:01Z

This problem occurs prominently in for instance estimation of mixed models. The R packages nlme and lme4 and certainly others use the log-Cholesky parameterization of the positive definite covariance matrix(ices).

For details see Difference between Cholesky decomposition and log-cholesky Decomposition

Some other references/links:

Parametrization of positive semidefinite matriceshttps://mathoverflow.net/questions/225152/parametrization-of-positive-semidefinite-matrices
Numerical Mathematics - Constraint or Unconstraint Optimization?
A parameterization of positive definite matrices in terms of partial correlation vines
Unconstrained Parameterizations for Variance-Covariance Matrices

Stack Exchange Network

How to do gradient descent when parameter is positive definite matrix

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

How to do gradient descent when parameter is positive definite matrix

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions