Chi-square distribution

Parameters Probability density function Cumulative distribution function $k > 0\,$ degrees of freedom $x \in [0; +\infty)\,$ $\frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1} e^{-x/2}\,$ $\frac{\gamma(k/2,x/2)}{\Gamma(k/2)}\,$ $k\,$ approximately $k-2/3\,$ $k-2\,$ if $k\geq 2\,$ $2\,k\,$ $\sqrt{8/k}\,$ $12/k\,$ $\frac{k}{2}\!+\!\ln(2\Gamma(k/2))\!+\!(1\!-\!k/2)\psi(k/2)$ $(1-2\,t)^{-k/2}$ for $2\,t<1\,$ $(1-2\,i\,t)^{-k/2}\,$

In probability theory and statistics, the chi-square distribution (also chi-squared or χ2  distribution) is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests.[1][2][3][4] It is useful because, under reasonable assumptions, easily calculated quantities can be proven to have distributions that approximate to the chi-square distribution if the null hypothesis is true.

The best-known situations in which the chi-square distribution are used are the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, and of the independence of two criteria of classification of qualitative data. Many other statistical tests also lead to a use of this distribution, like Friedman's analysis of variance by ranks.

Definition

If Xi are k independent, normally distributed random variables with mean 0 and variance 1, then the random variable

$Q = \sum_{i=1}^k X_i^2$

is distributed according to the chi-square distribution with k degrees of freedom. This is usually written

$Q\sim\chi^2_k.\,$

The chi-square distribution has one parameter: k - a positive integer that specifies the number of degrees of freedom (i.e. the number of Xi)

The chi-square distribution is a special case of the gamma distribution.

Characteristics

Probability density function

A probability density function of the chi-square distribution is

$f(x;k)= \begin{cases}\displaystyle \frac{1}{2^{k/2}\Gamma(k/2)}\,x^{(k/2) - 1} e^{-x/2}&\text{for }x>0,\\ 0&\text{for }x\le0, \end{cases}$

where Γ denotes the Gamma function, which has closed-form values at the half-integers.

Cumulative distribution function

$F(x;k)=\frac{\gamma(k/2,x/2)}{\Gamma(k/2)} = P(k/2, x/2)$

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages.

Characteristic function

The characteristic function of the Chi-square distribution is [5]

$\chi(t;k)=(1-2it)^{-k/2}.\,$

Expected value and variance

If $X\sim\chi^2_k$ then the mean is given by

$\frac{}{} \mathrm{E}(X)=k,$

and the variance is given by

$\frac{}{} \mathrm{Var}(X)=2k.$

Median

The median of $X\sim\chi^2_k$ is given approximately by

$k-\frac{2}{3}+\frac{4}{27k}-\frac{8}{729k^2}.$

Information entropy

The information entropy is given by

$H = \int_{-\infty}^\infty f(x;k)\ln(f(x;k)) dx = \frac{k}{2} + \ln \left( 2 \Gamma \left( \frac{k}{2} \right) \right) + \left(1 - \frac{k}{2}\right) \psi(k/2).$

where ψ(x) is the Digamma function.

Noncentral moments

The moments about zero of a chi-square distribution with k degrees of freedom are given by[6][7]

\begin{align} E(X^m) &= k (k+2) (k+4) \cdots (k+2m-2) \\ &= 2^m \frac{\Gamma(m+k/2)}{\Gamma(k/2)}. \end{align}

Derivation of the pdf for one degree of freedom

Let random variable Y be defined as Y = X2 where X has normal distribution with mean 0 and variance 1 (that is X ~ N(0,1)).

Then if $y<0, ~ P(Y and if $y\geq0, ~ P(Y

$f_y(y) = f_x(\sqrt{y})\frac{\partial(\sqrt{y})}{\partial y}-f_x(-\sqrt{y})\frac{\partial(-\sqrt{y})}{\partial y}$
$= \frac{1}{\sqrt{2\pi}}e^{\frac{-y}{2}}\frac{1}{2y^{1/2}} + \frac{1}{\sqrt{2\pi}}e^{\frac{-y}{2}}\frac{1}{2y^{1/2}}$
$= \frac{1}{2^{\frac{1}{2}} \Gamma(\frac{1}{2})}y^{\frac{1}{2} -1}e^{\frac{-y}{2}}$

Then $Y = X^2 \sim \chi^2_1$.

Related distributions and properties

The chi-square distribution has numerous applications in inferential statistics, for instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables divided by their respective degrees of freedom.

• If $X\sim\chi^2_k$, then as k tends to infinity, the distribution of $(X-k)/\sqrt{2k}$ tends to a standard normal distribution: see asymptotic distribution. This follows directly from the definition of the chi-squared distribution, the central limit theorem, and the fact that the mean and variance of $\chi^2_1$ are 1 and 2 respectively. However, convergence is slow as the skewness is $\sqrt{8/k}$ and the excess kurtosis is 12 / k.
• If $X\sim\chi^2_k$ then $\sqrt{2X}$ is approximately normally distributed with mean $\sqrt{2k-1}$ and unit variance (result credited to R. A. Fisher).
• If $X\sim\chi^2_k$ then $\sqrt[3]{X/k}$ is approximately normally distributed with mean 1 − 2 / (9k) and variance 2 / (9k) (Wilson and Hilferty,1931)
• $X \sim \mathrm{Exponential}(\lambda = \tfrac{1}{2})$ is an exponential distribution if $X \sim \chi_2^2$ (with 2 degrees of freedom).
• $Y \sim \chi_{\nu}^2$ is a chi-square distribution if $Y = \sum_{m=1}^{\nu} X_m^2$ for $X_i \sim N(0,1)$ independent that are normally distributed.
• If $\boldsymbol{z}'=[Z_1,Z_2,\cdots,Z_n]$, where the Zis are independent Normal(0,σ2) random variables or $\boldsymbol{z}\sim N_p(\boldsymbol{0},\sigma^2 \mathrm{I})$ and $\boldsymbol{A}$ is an $n\times n$ idempotent matrix with rank nk then the quadratic form $\frac{\boldsymbol{z}'\boldsymbol{A}\boldsymbol{z}}{\sigma^2}\sim \chi^2_{n-k}$.
• If the $X_i\sim N(\mu_i,1)$ have nonzero means, then $Y = \sum_{m=1}^k X_m^2$ is drawn from a noncentral chi-square distribution.
• The chi-square distribution $X\sim\chi^2_\nu$ is a special case of the gamma distribution, in that $X \sim {\Gamma}(\frac{\nu}{2}, \theta=2)$.
• $Y \sim \mathrm{F}(\nu_1, \nu_2)$ is an F-distribution if $Y = \frac{X_1 / \nu_1}{X_2 / \nu_2}$ where $X_1 \sim \chi_{\nu_1}^2$ and $X_2 \sim \chi_{\nu_2}^2$ are independent with their respective degrees of freedom.
• $Y \sim \chi^2(\bar{\nu})$ is a chi-square distribution if $Y = \sum_{m=1}^N X_m$ where $X_m \sim \chi^2(\nu_m)$ are independent and $\bar{\nu} = \sum_{m=1}^N \nu_m$.
• if X is chi-square distributed, then $\sqrt{X}$ is chi distributed.
• in particular, if $X \sim \chi_2^2$ (chi-square with 2 degrees of freedom), then $\sqrt{X}$ is Rayleigh distributed.
• if $X_1, \dots, X_n$ are i.i.d. N(μ,σ2) random variables, then $\sum_{i=1}^n(X_i - \bar X)^2 \sim \sigma^2 \chi^2_{n-1}$ where $\bar X = \frac{1}{n} \sum_{i=1}^n X_i$.
• if $X \sim \mathrm{SkewLogistic}(\tfrac{1}{2})\,$, then $\mathrm{log}(1 + e^{-X}) \sim \chi_2^2\,$
• The box below shows probability distributions with name starting with chi for some statistics based on $X_i\sim \mathrm{Normal}(\mu_i,\sigma^2_i),i=1,\cdots,k,$ independent random variables:
Name Statistic
chi-square distribution $\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2$
noncentral chi-square distribution $\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2$
chi distribution $\sqrt{\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2}$
noncentral chi distribution $\sqrt{\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2}$

References

1. ^ Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26", Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, ISBN 0-486-61272-4 .
2. ^ NIST (2006). Engineering Statistics Handbook - Chi-Square Distribution
3. ^ Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9.
4. ^ Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6.
5. ^ M.A. Sanders. "Characteristic function of the central chi-square distribution". Retrieved on 2009-03-06.
6. ^ Chi-square distribution, from MathWorld, retrieved Feb. 11, 2009
7. ^ M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1