Let $\Delta_n :=${$x\in \mathbb{R}^n\colon \sum_{i=1}^n x_i = 1$} denote the probability simplex in $\mathbb{R}^n$ and $A^{n \times n}$ be diagonalizable matrix.
I would like to know what is the fastest way (computationally-speaking) of solving the problem $\text{argmin}_{x\in \Delta_n} ||A x -y||^2$.
When $A=I$ is the identity, then the argmin is unique and can be found in $O(n \ln n)$ time. In fact, when $A=I$, the problem reduces to Euclidean projection on the simplex which can be done in $O(n \ln n)$---see e.g. Lemma 3.10 in https://www.cs.princeton.edu/techreports/2006/766.pdf.
Is there any hope of getting a similar time complexity when $A\neq I$, but diagonalizable? I am after algorithms with provable worst-case guarantees. If the algorithm produces an $\epsilon$-suboptimial solution for this problem, I want to know the algorithm's computational complexity as a function of $\epsilon$ and $n$.
-
$\begingroup$ Given that $A$ includes $n^2$ values and that the answer depends on all of them, I would be surprised that $\Omega(n^2)$ does not hold. If diagonalization is allowed "for free", then you minimize $\|T(DT^{-1}x-T^{-1}y)\|$ or $\|T(Du-v)\|$ (in a modified simplex). Unfortunately, this is not $\|T\|\|Du-v\|$. $\endgroup$user16034– user160342022年12月08日 09:37:06 +00:00Commented Dec 8, 2022 at 9:37
-
$\begingroup$ I think you can have the same running time when $A$ is already a diagonal matrix. It should be clear after you write KKT conditions for this problem. Otherwise, you clearly need $\Omega(n^2)$ time just to read the matrix. If you are looking for a practical algorithm, a variation of gradient descent should work. $\endgroup$Dmitry– Dmitry2022年12月09日 22:57:41 +00:00Commented Dec 9, 2022 at 22:57