Least squares leads to an n-dimensional "parabola" in the parameters. I assume the same is valid for other constrained least squares like non-negative least-squares.
This may be a wrong assumption. I am thinking on the quadratic terms (weights) in the l-2 residuals norm (the "squares" in the "least squares")
But computationally, would those methods benefit from momentum gradients, when solved iteratively? From what I observe (on a quick search), there aren't such implementations or papers on it.