Centering matrix
Find sources: "Centering matrix" – news · newspapers · books · scholar · JSTOR (August 2024) (Learn how and when to remove this message)
In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.
Definition
[edit ]The centering matrix of size n is defined as the n-by-n matrix
- {\displaystyle C_{n}=I_{n}-{\tfrac {1}{n}}J_{n}}
where {\displaystyle I_{n},円} is the identity matrix of size n and {\displaystyle J_{n}} is an n-by-n matrix of all 1's.
For example
- {\displaystyle C_{1}={\begin{bmatrix}0\end{bmatrix}}},
- {\displaystyle C_{2}=\left[{\begin{array}{rrr}1&0\0円&1\end{array}}\right]-{\frac {1}{2}}\left[{\begin{array}{rrr}1&1\1円&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {1}{2}}&-{\frac {1}{2}}\\-{\frac {1}{2}}&{\frac {1}{2}}\end{array}}\right]} ,
- {\displaystyle C_{3}=\left[{\begin{array}{rrr}1&0&0\0円&1&0\0円&0&1\end{array}}\right]-{\frac {1}{3}}\left[{\begin{array}{rrr}1&1&1\1円&1&1\1円&1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {2}{3}}&-{\frac {1}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&{\frac {2}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&-{\frac {1}{3}}&{\frac {2}{3}}\end{array}}\right]}
Properties
[edit ]Given a column-vector, {\displaystyle \mathbf {v} ,円} of size n, the centering property of {\displaystyle C_{n},円} can be expressed as
- {\displaystyle C_{n},円\mathbf {v} =\mathbf {v} -({\tfrac {1}{n}}J_{n,1}^{\textrm {T}}\mathbf {v} )J_{n,1}}
where {\displaystyle J_{n,1}} is a column vector of ones and {\displaystyle {\tfrac {1}{n}}J_{n,1}^{\textrm {T}}\mathbf {v} } is the mean of the components of {\displaystyle \mathbf {v} ,円}.
{\displaystyle C_{n},円} is symmetric positive semi-definite.
{\displaystyle C_{n},円} is idempotent, so that {\displaystyle C_{n}^{k}=C_{n}}, for {\displaystyle k=1,2,\ldots }. Once the mean has been removed, it is zero and removing it again has no effect.
{\displaystyle C_{n},円} is singular. The effects of applying the transformation {\displaystyle C_{n},円\mathbf {v} } cannot be reversed.
{\displaystyle C_{n},円} has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.
{\displaystyle C_{n},円} has a nullspace of dimension 1, along the vector {\displaystyle J_{n,1}}.
{\displaystyle C_{n},円} is an orthogonal projection matrix. That is, {\displaystyle C_{n}\mathbf {v} } is a projection of {\displaystyle \mathbf {v} ,円} onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace {\displaystyle J_{n,1}}. (This is the subspace of all n-vectors whose components sum to zero.)
The trace of {\displaystyle C_{n}} is {\displaystyle n(n-1)/n=n-1}.
Application
[edit ]Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix {\displaystyle X}.
The left multiplication by {\displaystyle C_{m}} subtracts a corresponding mean value from each of the n columns, so that each column of the product {\displaystyle C_{m},円X} has a zero mean. Similarly, the multiplication by {\displaystyle C_{n}} on the right subtracts a corresponding mean value from each of the m rows, and each row of the product {\displaystyle X,円C_{n}} has a zero mean. The multiplication on both sides creates a doubly centred matrix {\displaystyle C_{m},円X,円C_{n}}, whose row and column means are equal to zero.
The centering matrix provides in particular a succinct way to express the scatter matrix, {\displaystyle S=(X-\mu J_{n,1}^{\mathrm {T} })(X-\mu J_{n,1}^{\mathrm {T} })^{\mathrm {T} }} of a data sample {\displaystyle X,円}, where {\displaystyle \mu ={\tfrac {1}{n}}XJ_{n,1}} is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
- {\displaystyle S=X,円C_{n}(X,円C_{n})^{\mathrm {T} }=X,円C_{n},円C_{n},円X,円^{\mathrm {T} }=X,円C_{n},円X,円^{\mathrm {T} }.}
{\displaystyle C_{n}} is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are {\displaystyle k=n}, and {\displaystyle p_{1}=p_{2}=\cdots =p_{n}={\frac {1}{n}}}.
References
[edit ]- ^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.