I was going through the theory for weighted least-squares fitting and I understood its basic underlying concepts, but I couldn't understand why exactly do we keep the weights as a diagonal matrix during calculations.
This might be a dumb question but I don't have much experience with matrices and I am in the process of using them to solve more real-world problems so any input will be helpful.
-
1$\begingroup$ Perhaps you could provide more details? $\endgroup$Yuval Filmus– Yuval Filmus2020年10月04日 19:07:02 +00:00Commented Oct 4, 2020 at 19:07
-
$\begingroup$ Also, the weight matrix probably represents weights of points. If you have $n$ points, these are $n$ values. We put them in a diagonal matrix since doing so allows us to write some formulas in a succinct way. $\endgroup$Yuval Filmus– Yuval Filmus2020年10月04日 19:08:11 +00:00Commented Oct 4, 2020 at 19:08
2 Answers 2
$$ \text{Least Squares solves }\quad A \vec x = \vec b \quad\text{ for }\vec x $$ Say you have some matrix $A$ in the above equation and you would like each row to have different weight. Since Least Squares minimizes the squared differences, if you scale one row and its right-hand side up by a big factor, that will be deemed more important.
Now consider how matrix multiplication works on this small example:
$$
W \mathbin{:=} \begin{bmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{bmatrix},\quad
A \mathbin{:=} \begin{bmatrix}a_{00}&a_{01}&a_{02}\\a_{10}&a_{11}&a_{12}\\a_{20}&a_{21}&a_{22}\end{bmatrix}
$$
If you work through it manually, you'll see that already for the first row of $W$, it makes sense:
$$
\text{Weighted Least Squares solves }\quad WA\vec x = W\vec b\quad \text{ for }\vec x\\
W_0A = \begin{bmatrix}
w_0a_{00} & w_0a_{01} & w_0a_{02}
\end{bmatrix}
$$
Similarly, the other rows all get their weight multiplied to each element in the row. So generally we can say
$$
W_iA = \begin{bmatrix}
w_ia_{i0} & w_ia_{i1} & w_ia_{i2}
\end{bmatrix}
$$
That should answer your question as to why it works. If you also are asking why it would be done in this specific way - well, it just happens to be the way mathematicians tend to think in, allows for some nice notation, and Matrix operations are usually pretty fast compared to loops because they are heavily optimized.
Addendum: Deriving the Weighted Least Squares Formulation
Related, though not technically asked for: How we get from that equation $A\vec x = \vec b$ to the actual WLS formulation.
$$ WA\vec x = W \vec b\\ A^T WA \vec x = A^T W \vec b\\ (A^T W A)^{-1} A^T W A \vec x = (A^T W A)^{-1} A^T W \vec b $$
Notice that sometimes we need to be careful whether inverses exist. Here, the inverse $(A^T W A)^{-1}$ does exist. Because $W$ is just a square diagonal matrix, so not very relevant to this argument (it's always invertible) and $A^T A$ is always invertible if $A$ has independent columns. This trick allows us to add an inverse into the equation even if $A$ itself might not be invertible.
$$ \require{cancel} \cancel{((A^T W A)^{-1} A^T W A)} \vec x = (A^T W A)^{-1} A^T W \vec b\\ \vec x = (A^T W A)^{-1} A^T W \vec b $$
The weights are stored in a matrix rather than a vector so that the weighting can be expressed as a matrix/vector product.
E.g.
$$\begin{pmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{pmatrix}\begin{pmatrix}c_0\\c_1\\c_2\end{pmatrix}=\begin{pmatrix}w_0c_0\\w_1c_1\\w_2c_2\end{pmatrix}.$$
This is a common trick in linear algebra, they don't use element-wise products between vectors. In implementations, the weights remain stored in a vector.
Explore related questions
See similar questions with these tags.