Covariance matrix

From Wikipedia, the free encyclopedia

In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.

1 Definition
- 1.1 Generalization of the variance
2 Conflicting nomenclatures and notations
3 Properties
4 As a linear operator
5 Which matrices are covariance matrices?
6 How to find a valid covariance matrix?
7 Complex random vectors
8 Estimation
9 Probability density function
10 See also
11 Notes
12 References

[edit] Definition

If entries in the column vector

$X = \begin{bmatrix}X_1 \\ \vdots \\ X_n \end{bmatrix}$

are random variables, each with finite variance, then the covariance matrix Σ is the matrix whose (i, j) entry is the covariance

$\Sigma_{ij} = \mathrm{cov}(X_i, X_j) = \mathrm{E}\begin{bmatrix} (X_i - \mu_i)(X_j - \mu_j) \end{bmatrix}$

where

$\mu_i = \mathrm{E}(X_i)\,$

is the expected value of the ith entry in the vector X. In other words, we have

$\Sigma = \begin{bmatrix} \mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)] \end{bmatrix}.$

The inverse of this matrix, $Σ - 1$ , is called the inverse covariance matrix or the precision matrix.^[1]

[edit] Generalization of the variance

The definition above is equivalent to the matrix equality

$\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$

This form can be seen as a generalization of the scalar-valued variance to higher dimensions. Recall that for a scalar-valued random variable X

$\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2], \,$

where

$\mu = \mathrm{E}(X).\,$

[edit] Conflicting nomenclatures and notations

Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector $X$ , because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector $X$ . Thus

$\operatorname{var}(\textbf{X}) = \operatorname{cov}(\textbf{X}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top \right].$

However, the notation for the cross-covariance between two vectors is standard:

$\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top \right].$

The var notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them.

The matrix $Σ$ is also often called the variance-covariance matrix since the diagonal terms are in fact variances.

[edit] Properties

For $\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$ and $\mu = \mathrm{E}(\textbf{X})$ , where X is a random p-dimensional variable and Y a random q-dimensional variable, the following basic properties apply:

$\Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top}$
$\mathbf{\Sigma}$ is positive semi-definite
$\operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{var}(\mathbf{X})\, \mathbf{A^\top}$
$\operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$
$\operatorname{cov}(\mathbf{X_1} + \mathbf{X_2},\mathbf{Y}) = \operatorname{cov}(\mathbf{X_1},\mathbf{Y}) + \operatorname{cov}(\mathbf{X_2}, \mathbf{Y})$
If p = q, then $\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})$
$\operatorname{cov}(\mathbf{AX}, \mathbf{B}^\top\mathbf{Y}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X}, \mathbf{Y}) \,\mathbf{B}$
If $\mathbf{X}$ and $\mathbf{Y}$ are independent, then $\operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0$

where $\mathbf{X}, \mathbf{X_1}$ and $\mathbf{X_2}$ are random $\mathbf{(p \times 1)}$ vectors, $\mathbf{Y}$ is a random $\mathbf{(q \times 1)}$ vector, $\mathbf{a}$ is $\mathbf{(q \times 1)}$ vector, $\mathbf{A}$ and $\mathbf{B}$ are $\mathbf{(q \times p)}$ matrices.

This covariance matrix is a useful tool in many different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis (PCA) and Karhunen-Loève transform (KL-transform).

[edit] As a linear operator

Applied to one vector, the covariance matrix maps a linear combination, c, of the random variables, X, onto a vector of covariances with those variables: $\mathbf c^\top\Sigma = \operatorname{cov}(\mathbf c^\top\mathbf X,\mathbf X)$ . Treated as a 2-form, it yields the covariance between the two linear combinations: $\mathbf d^\top\Sigma\mathbf c=\operatorname{cov}(\mathbf d^\top\mathbf X,\mathbf c^\top\mathbf X)$ . The variance of a linear combination is then $\mathbf c^\top\Sigma\mathbf c$ , its covariance with itself.

[edit] Which matrices are covariance matrices?

From the identity just above (let $\mathbf{b}$ be a $(p \times 1)$ real-valued vector)

$\operatorname{var}(\mathbf{b}^\top\mathbf{X}) = \mathbf{b}^\top \operatorname{var}(\mathbf{X}) \mathbf{b},\,$

the fact that the variance of any real-valued random variable is nonnegative, and the symmetry of the covariance matrix's definition it follows that only a positive semi-definite symmetric matrix can be a covariance matrix. The answer to the converse question, whether every positive semi-definite symmetric matrix is a covariance matrix, is "yes." To see this, suppose M is a p×p nonnegative-definite symmetric matrix. From the finite-dimensional case of the spectral theorem, it follows that M has a nonnegative symmetric square root, which let us call M^1/2. Let $\mathbf{X}$ be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then

$\operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,$

[edit] How to find a valid covariance matrix?

In some applications (e.g. building data models from only partially observed data) one want to find the “nearest” covariance matrix to a given symmetric matrix (e.g. of observed covariances). In 2002, Higham^[2] formalized the notion of nearness using a weighted Frobenius norm and provided a method for computing the nearest covariance matrix.

[edit] Complex random vectors

The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:

$\operatorname{var}(z) = \operatorname{E} \left[ (z-\mu)(z-\mu)^{*} \right]$

where the complex conjugate of a complex number $z$ is denoted $z *$ .

If $Z$ is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:

$\operatorname{E} \left[ (Z-\mu)(Z-\mu)^{*} \right]$

where $Z *$ denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.

[edit] Estimation

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. See estimation of covariance matrices.

[edit] Probability density function

The probability density function of a set of $n$ correlated random variables, the joint probability function of which is a n-order Gaussian vector, is given on the Maximum likelihood page.

[edit] See also

[edit] Notes

^ Wasserman, Larry (2004). All of Statistics: A Concise Course in Statistical Inference.
^ Higham, Nicholas J.. "Computing the nearest correlation matrix—a problem from finance". IMA Journal of Numerical Analysis 22 (3): 329-343. doi:10.1093/imanum/22.3.329.

[edit] References

Eric W. Weisstein, Covariance Matrix at MathWorld.
N.G. van Kampen, Stochastic processes in physics and chemistry. New York: North-Holland, 1981.

[0] Wasserman, Larry (2004). All of Statistics: A Concise Course in Statistical Inference.

[1] Higham, Nicholas J.. "Computing the nearest correlation matrix—a problem from finance". IMA Journal of Numerical Analysis 22 (3): 329-343. doi:10.1093/imanum/22.3.329.

[1]

[2]

Covariance matrix

From Wikipedia, the free encyclopedia

Contents

[edit] Definition

[edit] Generalization of the variance

[edit] Conflicting nomenclatures and notations

[edit] Properties

[edit] As a linear operator

[edit] Which matrices are covariance matrices?

[edit] How to find a valid covariance matrix?

[edit] Complex random vectors

[edit] Estimation

[edit] Probability density function

[edit] See also

[edit] Notes

[edit] References

Views

Personal tools

Navigation

Search

Interaction

Toolbox

Languages