Why is the weight matrix diagonal in weighted least squares regression?

Question 1

I was going through the theory for weighted least-squares fitting and I understood its basic underlying concepts, but I couldn't understand why exactly do we keep the weights as a diagonal matrix during calculations.

This might be a dumb question but I don't have much experience with matrices and I am in the process of using them to solve more real-world problems so any input will be helpful.

Question 2

Perhaps you could provide more details?

Question 3

Also, the weight matrix probably represents weights of points. If you have $n$ points, these are $n$ values. We put them in a diagonal matrix since doing so allows us to write some formulas in a succinct way.

Question 4

$$ \text{Least Squares solves }\quad A \vec x = \vec b \quad\text{ for }\vec x $$ Say you have some matrix $A$ in the above equation and you would like each row to have different weight. Since Least Squares minimizes the squared differences, if you scale one row and its right-hand side up by a big factor, that will be deemed more important.

Now consider how matrix multiplication works on this small example:
$$ W \mathbin{:=} \begin{bmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{bmatrix},\quad A \mathbin{:=} \begin{bmatrix}a_{00}&a_{01}&a_{02}\\a_{10}&a_{11}&a_{12}\\a_{20}&a_{21}&a_{22}\end{bmatrix} $$ If you work through it manually, you'll see that already for the first row of $W$, it makes sense: $$ \text{Weighted Least Squares solves }\quad WA\vec x = W\vec b\quad \text{ for }\vec x\\ W_0A = \begin{bmatrix} w_0a_{00} & w_0a_{01} & w_0a_{02} \end{bmatrix} $$ Similarly, the other rows all get their weight multiplied to each element in the row. So generally we can say $$ W_iA = \begin{bmatrix} w_ia_{i0} & w_ia_{i1} & w_ia_{i2} \end{bmatrix} $$

That should answer your question as to why it works. If you also are asking why it would be done in this specific way - well, it just happens to be the way mathematicians tend to think in, allows for some nice notation, and Matrix operations are usually pretty fast compared to loops because they are heavily optimized.

Addendum: Deriving the Weighted Least Squares Formulation

Related, though not technically asked for: How we get from that equation $A\vec x = \vec b$ to the actual WLS formulation.

$$ WA\vec x = W \vec b\\ A^T WA \vec x = A^T W \vec b\\ (A^T W A)^{-1} A^T W A \vec x = (A^T W A)^{-1} A^T W \vec b $$

Notice that sometimes we need to be careful whether inverses exist. Here, the inverse $(A^T W A)^{-1}$ does exist. Because $W$ is just a square diagonal matrix, so not very relevant to this argument (it's always invertible) and $A^T A$ is always invertible if $A$ has independent columns. This trick allows us to add an inverse into the equation even if $A$ itself might not be invertible.

$$ \require{cancel} \cancel{((A^T W A)^{-1} A^T W A)} \vec x = (A^T W A)^{-1} A^T W \vec b\\ \vec x = (A^T W A)^{-1} A^T W \vec b $$

Question 5

The weights are stored in a matrix rather than a vector so that the weighting can be expressed as a matrix/vector product.

E.g.

$$\begin{pmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{pmatrix}\begin{pmatrix}c_0\\c_1\\c_2\end{pmatrix}=\begin{pmatrix}w_0c_0\\w_1c_1\\w_2c_2\end{pmatrix}.$$

This is a common trick in linear algebra, they don't use element-wise products between vectors. In implementations, the weights remain stored in a vector.

lucidbrot 1013 bronze badges · Answer 1 · 2022-03-15 09:16:03Z

$$ \text{Least Squares solves }\quad A \vec x = \vec b \quad\text{ for }\vec x $$ Say you have some matrix $A$ in the above equation and you would like each row to have different weight. Since Least Squares minimizes the squared differences, if you scale one row and its right-hand side up by a big factor, that will be deemed more important.

Now consider how matrix multiplication works on this small example:
$$ W \mathbin{:=} \begin{bmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{bmatrix},\quad A \mathbin{:=} \begin{bmatrix}a_{00}&a_{01}&a_{02}\\a_{10}&a_{11}&a_{12}\\a_{20}&a_{21}&a_{22}\end{bmatrix} $$ If you work through it manually, you'll see that already for the first row of $W$, it makes sense: $$ \text{Weighted Least Squares solves }\quad WA\vec x = W\vec b\quad \text{ for }\vec x\\ W_0A = \begin{bmatrix} w_0a_{00} & w_0a_{01} & w_0a_{02} \end{bmatrix} $$ Similarly, the other rows all get their weight multiplied to each element in the row. So generally we can say $$ W_iA = \begin{bmatrix} w_ia_{i0} & w_ia_{i1} & w_ia_{i2} \end{bmatrix} $$

That should answer your question as to why it works. If you also are asking why it would be done in this specific way - well, it just happens to be the way mathematicians tend to think in, allows for some nice notation, and Matrix operations are usually pretty fast compared to loops because they are heavily optimized.

Addendum: Deriving the Weighted Least Squares Formulation

Related, though not technically asked for: How we get from that equation $A\vec x = \vec b$ to the actual WLS formulation.

$$ WA\vec x = W \vec b\\ A^T WA \vec x = A^T W \vec b\\ (A^T W A)^{-1} A^T W A \vec x = (A^T W A)^{-1} A^T W \vec b $$

Notice that sometimes we need to be careful whether inverses exist. Here, the inverse $(A^T W A)^{-1}$ does exist. Because $W$ is just a square diagonal matrix, so not very relevant to this argument (it's always invertible) and $A^T A$ is always invertible if $A$ has independent columns. This trick allows us to add an inverse into the equation even if $A$ itself might not be invertible.

$$ \require{cancel} \cancel{((A^T W A)^{-1} A^T W A)} \vec x = (A^T W A)^{-1} A^T W \vec b\\ \vec x = (A^T W A)^{-1} A^T W \vec b $$

user16034 · Answer 2 · 2023-08-07 13:09:54Z

The weights are stored in a matrix rather than a vector so that the weighting can be expressed as a matrix/vector product.

E.g.

$$\begin{pmatrix}w_0&0&0\0円&w_1&0\0円&0&w_2\end{pmatrix}\begin{pmatrix}c_0\\c_1\\c_2\end{pmatrix}=\begin{pmatrix}w_0c_0\\w_1c_1\\w_2c_2\end{pmatrix}.$$

This is a common trick in linear algebra, they don't use element-wise products between vectors. In implementations, the weights remain stored in a vector.

Stack Exchange Network

Why is the weight matrix diagonal in weighted least squares regression?

2 Answers 2

Addendum: Deriving the Weighted Least Squares Formulation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions