# Hotelling's T-square distribution

In statistics, Hotelling's T-square statistic, named for Harold Hotelling, is a generalization of Student's t statistic that is used in multivariate hypothesis testing.

Hotelling's T-square statistic is defined as

$t^{2}=n({\mathbf {x} }-{\mathbf {\mu } })'{\mathbf {W} }^{-1}({\mathbf {x} }-{\mathbf {\mu } })$ where n is a number of points (see below), ${\mathbf {x} }$ is a column vector of $p$ elements and ${\mathbf {W} }$ is a $p\times p$ matrix.

If $x\sim N_{p}(\mu ,{\mathbf {V} })$ is a random variable with a multivariate Gaussian distribution and ${\mathbf {W} }\sim W_{p}(m,{\mathbf {V} })$ (independent of x) has a Wishart distribution with the same non-singular variance matrix $\mathbf {V}$ and with $m=n-1$ , then the distribution of $t^{2}$ is $T^{2}(p,m)$ , Hotelling's T-square distribution with parameters p and m. It can be shown that

${\frac {m-p+1}{pm}}T^{2}\sim F_{p,m-p+1}$ where $F$ is the F-distribution.

Now suppose that

${\mathbf {x} }_{1},\dots ,{\mathbf {x} }_{n}$ are p×1 column vectors whose entries are real numbers. Let

${\overline {\mathbf {x} }}=(\mathbf {x} _{1}+\cdots +\mathbf {x} _{n})/n$ be their mean. Let the p×p positive-definite matrix

${\mathbf {W} }=\sum _{i=1}^{n}(\mathbf {x} _{i}-{\overline {\mathbf {x} }})(\mathbf {x} _{i}-{\overline {\mathbf {x} }})'/(n-1)$ be their "sample variance" matrix. (The transpose of any matrix M is denoted above by M′). Let μ be some known p×1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is

$t^{2}=n({\overline {\mathbf {x} }}-{\mathbf {\mu } })'{\mathbf {W} }^{-1}({\overline {\mathbf {x} }}-{\mathbf {\mu } }).$ Note that $t^{2}$ is closely related to the squared Mahalanobis distance.

In particular, it can be shown  that if ${\mathbf {x} }_{1},\dots ,{\mathbf {x} }_{n}\sim N_{p}(\mu ,{\mathbf {V} })$ , are independent, and ${\overline {\mathbf {x} }}$ and ${\mathbf {W} }$ are as defined above then ${\mathbf {W} }$ has a Wishart distribution with n − 1 degrees of freedom

$\mathbf {W} \sim W_{p}(V,n-1)$ .

and is independent of ${\overline {\mathbf {x} }}$ , and

${\overline {\mathbf {x} }}\sim N_{p}(\mu ,V/n)$ This implies that:

$t^{2}=n({\overline {\mathbf {x} }}-{\mathbf {\mu } })'{\mathbf {W} }^{-1}({\overline {\mathbf {x} }}-{\mathbf {\mu } })\sim T^{2}(p,n-1).$ ## Hotelling's two-sample T-square statistic

If ${\mathbf {x} }_{1},\dots ,{\mathbf {x} }_{n_{x}}\sim N_{p}(\mu ,{\mathbf {V} })$ and ${\mathbf {y} }_{1},\dots ,{\mathbf {y} }_{n_{y}}\sim N_{p}(\mu ,{\mathbf {V} })$ , with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define

${\overline {\mathbf {x} }}={\frac {1}{n_{x}}}\sum _{i=1}^{n_{x}}\mathbf {x} _{i}\qquad {\overline {\mathbf {y} }}={\frac {1}{n_{y}}}\sum _{i=1}^{n_{y}}\mathbf {y} _{i}$ as the sample means, and

${\mathbf {W} }={\frac {\sum _{i=1}^{n_{x}}(\mathbf {x} _{i}-{\overline {\mathbf {x} }})(\mathbf {x} _{i}-{\overline {\mathbf {x} }})'+\sum _{i=1}^{n_{y}}(\mathbf {y} _{i}-{\overline {\mathbf {y} }})(\mathbf {y} _{i}-{\overline {\mathbf {y} }})'}{n_{x}+n_{y}-2}}$ as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is

$t^{2}={\frac {n_{x}n_{y}}{n_{x}+n_{y}}}({\overline {\mathbf {x} }}-{\overline {\mathbf {y} }})'{\mathbf {W} }^{-1}({\overline {\mathbf {x} }}-{\overline {\mathbf {y} }})\sim T^{2}(p,n_{x}+n_{y}-2)$ and it can be related to the F-distribution by

${\frac {n_{x}+n_{y}-p-1}{(n_{x}+n_{y}-2)p}}t^{2}\sim F(p,n_{x}+n_{y}-1-p).$ • Wilks' lambda distribution (in multivariate statistics Wilks' $\Lambda$ is to Hotelling's $T^{2}$ as Snedecor's $F$ is to Student's $t$ in univariate statistics). 