2
$\begingroup$

So, suppose I have an objective function $\mathcal{L}(\Sigma)$ where $\Sigma$ is a positive definite matrix. Now, I want to optimize this function using gradient descent. Now, I think if I compute the gradient and write the update equation for $\Sigma$, the positive definite constraint may not be maintained. How can I be sure that the optimum I get is also at a pd matrix.

asked Apr 29, 2021 at 15:39
$\endgroup$

3 Answers 3

3
$\begingroup$

All symmetric positive definite matrix are diagonalizable and have all eigenvalues positive.
The matrix logarithm will be a symmetric matrix.
If the matrix is size 3 for example, the logarithm will have 6 free parameters; in general $n(n+1)/2$ parameters are needed for the logarithm. Define the function to optimize over these parameters for the logarithm of the matrix, then use the matrix exponential to find the corresponding positive definite matrix you use in the objective function.

answered Apr 29, 2021 at 16:23
$\endgroup$
3
$\begingroup$

Edit: I wrote a blogpost about this problem, see here

Elaborating on the other two answers, one way to achieve this is to not optimize directly over the matrix $\Sigma$ that is constrained to be in the space of symmetric positive definite matrices ($SPD$), but over some other parameter $\theta \in \mathbb{R}^n$ that has no constrain and that parametrizes the positive definite matrices. By "parametrizes", what we mean is that we have some function $f: \mathbb{R}^n \to SPD$, such that for every $\Sigma$ there exists a $\theta$ such that $f(\theta)=\Sigma$, and $f$ is also differentiable.

The two other answers provide different examples of parametrizations. One answer proposes using the log-Cholesky parametrization. In this case, $\theta$ refers to the non-zero entries of lower-triangular matrices $L$. We have that a lower-triangular matrix $L$ parametrizes the SPD matrices by a function $f$ which exponentiates the diagonal elements of $L$ and then multiplies the result with its transpose. Thus, you can just optimize on the entries of $L$, and at each step generate the $\Sigma$ with $f$, to compute the loss. The non-zero entries of $L$ are not constrained, so you can just use regular optimization techniques.

The second answer proposes using the matrix-exponential, which maps from the space of symmetric matrices $Sym$ to symmetric positive definite matrices $SPD$. In the space of symmetric matrices, you can just make $\theta$ the lower triangular entries again, which are unconstrained, and then at each step compute the $\Sigma$ corresponding to the symmetric matrix given by $\theta$, to compute the loss.

This approach is related to optimization on manifolds, and there exist some software tools that can do most of the heavy lifting for you. The amazing Python package geotorch will let you constrain your matrix to be positive definite with just one call, and let you forget about the details. There is also the tool of Parametrizations in Pytorch, the tutorial of which explains some of the concepts in the other answers. Notably, parametrizations were introduced to PyTorch by the author of geotorch.

answered Aug 14, 2024 at 19:34
$\endgroup$
2
$\begingroup$

This problem occurs prominently in for instance estimation of mixed models. The R packages nlme and lme4 and certainly others use the log-Cholesky parameterization of the positive definite covariance matrix(ices).

For details see Difference between Cholesky decomposition and log-cholesky Decomposition

Some other references/links:

answered Apr 29, 2021 at 16:31
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.