Matrix calculus

	Topics in calculus
	Fundamental theorem; Limits of functions; Continuity; Vector calculus; Matrix calculus; Mean value theorem
	Differentiation
	Product rule; Quotient rule; Chain rule; Change of variables; Implicit differentiation; Taylor's theorem; Related rates; List of differentiation identities
	Integration
	Lists of integrals; Improper integrals; Integration by:; parts, disks, cylindrical; shells, substitution,; trigonometric substitution,; partial fractions, changing order

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces of matrices, where it defines the matrix derivative. This notation is well-suited to describing systems of differential equations, and taking derivatives of matrix-valued functions with respect to matrix variables. This notation is commonly used in statistics and engineering, while the tensor index notation is preferred in physics.

[edit] Notice

This article uses another definition for vector and matrix calculus than the form often encountered within the field of estimation theory and pattern recognition. The resulting equations will therefore appear to be transposed when compared to the equations used in textbooks within these fields.

[edit] Notation

Let M(n,m) denote the space of real n×m matrices with n rows and m columns, whose elements will be denoted F, X, Y, etc. An element of M(n,1), that is, a column vector, is denoted with a boldface lowercase letter x, while x^T denotes its transpose row vector. An element of M(1,1) is a scalar, and denoted a, b, c, f, t etc. All functions are assumed to be of differentiability class C¹ unless otherwise noted.

[edit] Vector calculus

Main article: Vector calculus

Because the space M(n,1) is identified with the Euclidean space Rⁿ and M(1,1) is identified with R, the notations developed here can accommodate the usual operations of vector calculus.

The tangent vector to a curve x : R → Rⁿ is

$\frac{\partial \mathbf{x}} {\partial t} = \begin{bmatrix} \frac{\partial x_1}{\partial t} \\ \vdots \\ \frac{\partial x_n}{\partial t} \\ \end{bmatrix}.$
The gradient of a scalar function f : Rⁿ → R

$\frac{\partial f}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \cdots & \frac{\partial f}{\partial x_n} \\ \end{bmatrix}.$

The directional derivative of f in the direction of v is then

$\nabla_\mathbf{v} f = \frac{\partial f}{\partial \mathbf{x}}\mathbf{v}.$
The pushforward or differential of a function f : R^m → Rⁿ is described by the Jacobian matrix

$\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_m}\\ \vdots & \ddots & \vdots\\ \frac{\partial f_n}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial x_m}\\ \end{bmatrix}.$

The pushforward along f of a vector v in R^m is

$d\,\mathbf{f}(\mathbf{v}) = \frac{\partial \mathbf{f}}{\partial \mathbf{x}} \mathbf{v}.$

[edit] Matrix calculus

For the purposes of defining derivatives of simple functions, not much changes with matrix spaces; the space of n×m matrices is isomorphic to the vector space R^nm. The three derivatives familiar from vector calculus have close analogues here, though beware the complications that arise in the identities below.

The tangent vector of a curve F : R → M(n,m)

$\frac{\partial \mathbf{F}}{\partial t} = \begin{bmatrix} \frac{\partial F_{1,1}}{\partial t} & \cdots & \frac{\partial F_{1,m}}{\partial t}\\ \vdots & \ddots & \vdots\\ \frac{\partial F_{n,1}}{\partial t} & \cdots & \frac{\partial F_{n,m}}{\partial t}\\ \end{bmatrix}.$
The gradient of a scalar function f : M(n,m) → R

$\frac{\partial f}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial f}{\partial X_{1,1}} & \cdots & \frac{\partial f}{\partial X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial f}{\partial X_{1,m}} & \cdots & \frac{\partial f}{\partial X_{n,m}}\\ \end{bmatrix}.$

Notice that the indexing of the gradient with respect to X is transposed as compared with the indexing of X. The directional derivative of f in the direction of matrix Y is given by

$\nabla_\mathbf{Y} f = \operatorname{tr} \left(\frac{\partial f}{\partial \mathbf{X}} \mathbf{Y}\right),$

where tr denotes the trace.
The differential or the matrix derivative of a function F : M(n,m) → M(p,q) is an element of M(p,q) ⊗ M(m,n), a fourth rank tensor (the reversal of m and n here indicates the dual space of M(n,m)). In short it is an m×n matrix each of whose entries is a p×q matrix.

$\frac{\partial\mathbf{F}} {\partial\mathbf{X}}= \begin{bmatrix} \frac{\partial\mathbf{F}}{\partial X_{1,1}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,1}}\\ \vdots & \ddots & \vdots\\ \frac{\partial\mathbf{F}}{\partial X_{1,m}} & \cdots & \frac{\partial \mathbf{F}}{\partial X_{n,m}}\\ \end{bmatrix},$

and note that each ∂F/∂X_i,j is a p×q matrix defined as above. Note also that this matrix has its indexing transposed; m rows and n columns. The pushforward along F of an n×m matrix Y in M(n,m) is then

$d\mathbf{F}(\mathbf{Y}) = \operatorname{tr}\left(\frac{\partial\mathbf{F}} {\partial\mathbf{X}}\mathbf{Y}\right).$

Note that this definition encompasses all of the preceding definitions as special cases.

[edit] Identities

Note that matrix multiplication is not commutative, so in these identities, the order must not be changed.

Chain rule: If Z is a function of Y which in turn is a function of X

$\frac{\partial \mathbf{Z}} {\partial \mathbf{X}} = \frac{\partial \mathbf{Z}} {\partial \mathbf{Y}} \frac{\partial \mathbf{Y}} {\partial \mathbf{X}}$
Product rule:

$\frac{\partial (\mathbf{Y}^T\mathbf{Z})}{\partial \mathbf{X}} = (\mathbf{Z}^T)\frac{\partial\mathbf{Y}}{\partial \mathbf{X}} + (\mathbf{Y}^T)\frac{\partial\mathbf{Z}}{\partial \mathbf{X}}$

[edit] Examples

[edit] Derivative of linear functions

This section lists some commonly used vector derivative formulas for linear equations evaluating to a vector.

$\frac{\partial \; \textbf{a}^T\textbf{x}}{\partial \; \textbf{x}} = \frac{\partial \; \textbf{x}^T\textbf{a}}{\partial \; \textbf{x}} = \textbf{a}$

$\frac{\partial \; \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{A}^T$

$\frac{\partial \; \textbf{x}^T\textbf{A}}{\partial \; \textbf{x}} = \textbf{A}$

[edit] Derivative of quadratic functions

This section lists some commonly used vector derivative formulas for quadratic matrix equations evaluating to a scalar.

$\frac{\partial \; \textbf{x}^T \textbf{A}\textbf{x}}{\partial \; \textbf{x}} = \textbf{x}^T(\textbf{A}^T + \textbf{A})$

$\frac{\partial \; (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} (\textbf{D}\textbf{x} + \textbf{e}) }{\partial \; \textbf{x}} = (\textbf{D}\textbf{x} + \textbf{e})^T \textbf{C}^T \textbf{A} + (\textbf{A}\textbf{x} + \textbf{b})^T \textbf{C} \textbf{D}$

Related to this is the derivative of the Euclidean norm:

$\frac{\partial \; \|\mathbf{x}-\mathbf{a}\|}{\partial \; \textbf{x}} = \frac{(\mathbf{x}-\mathbf{a})^T}{\|\mathbf{x}-\mathbf{a}\|}.$

[edit] Derivative of matrix traces

This section shows examples of matrix differentiation of common trace equations.

$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B})}{\partial \; \textbf{X}} = \frac{\partial \; \operatorname{tr}( \textbf{B}^T \textbf{X}^T \textbf{A}^T)}{\partial \; \textbf{X}} = \textbf{A}^T \textbf{B}^T$

$\frac{\partial \; \operatorname{tr}( \textbf{A} \textbf{X} \textbf{B} \textbf{X}^T \textbf{C}) }{\partial \; \textbf{X}} = \textbf{A}^T \textbf{C}^T \textbf{X} \textbf{B}^T + \textbf{C} \textbf{A} \textbf{X} \textbf{B}$

[edit] Relation to other derivatives

There are other commonly used definitions for derivatives in multivariable spaces. For topological vector spaces, the most familiar is the Fréchet derivative, which makes use of a norm. In the case of matrix spaces, there are several matrix norms available, all of which are equivalent since the space is finite-dimensional. However the matrix derivative defined in this article makes no use of any topology on M(n,m). It is defined solely in terms of partial derivatives, which are sensitive only to variations in a single dimension at a time, and thus are not bound by the full differentiable structure of the space. For example, it is possible for a map to have all partial derivatives exist at a point, and yet not be continuous in the topology of the space. See for example Hartogs' theorem. The matrix derivative is not a special case of the Fréchet derivative for matrix spaces, but rather a convenient notation for keeping track of many partial derivatives for doing calculations, though in the case that a function is Fréchet differentiable, the two derivatives will agree.

[edit] Usages

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers. This includes the derivation of:

[edit] Alternatives

The tensor index notation with its Einstein summation convention is very similar to the matrix calculus, except one writes only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high rank tensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. Note that a matrix can be considered simply a tensor of rank two.

[edit] See also

Derivative (generalizations)

[edit] External links

Matrix Calculus appendix from Introduction to Finite Element Methods book on University of Colorado at Boulder. Uses the Hessian (transpose to Jacobian) definition of vector and matrix derivatives.
Matrix calculus Matrix Reference Manual , Imperial College London.
Appendix D to Jon Dattorro, Convex Optimization & Euclidean Distance Geometry. Uses the Hessian definition.
The Matrix Cookbook, with a derivatives chapter. Uses the Hessian definition.

Matrix calculus

From Wikipedia, the free encyclopedia

Contents

[edit] Notice

[edit] Notation

[edit] Vector calculus

[edit] Matrix calculus

[edit] Identities

[edit] Examples

[edit] Derivative of linear functions

[edit] Derivative of quadratic functions

[edit] Derivative of matrix traces

[edit] Relation to other derivatives

[edit] Usages

[edit] Alternatives

[edit] See also

[edit] External links

Views

Personal tools

Navigation

Search

Interaction

Toolbox

Languages