Anderson-Darling test

Jump to navigation Jump to search

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]


Overview

The Anderson-Darling test, named after Theodore Wilbur Anderson, Jr. (1918–?) and Donald A. Darling (1915–?), who invented it in 1952[1], is one of the most powerful statistics for detecting most departures from normality. It may be used with small sample sizes n ≤ 25. Very large sample sizes may reject the assumption of normality with only slight imperfections, but industrial data with sample sizes of 200 and more have passed the Anderson-Darling test.[citation needed]

The Anderson-Darling test assesses whether a sample comes from a specified distribution. The formula for the test statistic <math>A</math> to assess if data <math>\{Y_1<\cdots <Y_N\}</math> (note that the data must be put in order) comes from a distribution with cumulative distribution function (CDF) <math>F</math> is

<math>A^2 = -N-S</math>

where

<math>S=\sum_{k=1}^N \frac{2k-1}{N}\left[\ln F(Y_k) + \ln\left(1-F(Y_{N+1-k})\right)\right].</math>

The test statistic can then be compared against the critical values of the theoretical distribution (dependent on which <math>F</math> is used) to determine the P-value.

The Anderson-Darling test for normality is a distance or empirical distribution function (EDF) test. It is based upon the concept that when given a hypothesized underlying distribution, the data can be transformed to a uniform distribution. The transformed sample data can be then tested for uniformity with a distance test (Shapiro 1980).

In comparisons of power, Stephens (1974) found <math>A^2</math> to be one of the best EDF statistics for detecting most departures from normality.[2] The only statistic close was the <math>W^2</math> (Cramér von-Mises test) statistic.

Procedure

(If testing for normal distribution of the variable X)

1) The data of the variable X that should be tested is sorted from low to high.

2) The mean, <math>\bar{X}</math>, and standard deviation, <math>s</math>, are calculated from the sample of X.

3) The values of X are standardized as follows:

<math>Y_i=\frac{X_i-\bar{X}}{s}</math>

4) With the standard normal CDF <math>\Phi</math>, <math>A^2</math> is calculated using:

<math>A^2 = -n -\frac{1}{n} \sum_{i=1}^n (2i-1)(\ln \Phi(Y_i)+ \ln(1-\Phi(Y_{n+1-i}))).</math>

5) <math>A^{2*}</math>, an approximate adjustment for sample size, is calculated using:

<math>A^{2*}=A^2\left(1+\frac{0.75}{n}+\frac{2.25}{n^2}\right)</math>

6) If <math>A^{2*}</math> exceeds 0.752 then the hypothesis of normality is rejected for a 5% level test.

Note:

1. If s = 0 or any <math>P_i=</math>(0 or 1) then <math>A^2</math> cannot be calculated and is undefined.

2. Above, it was assumed that the variable <math>X_i</math> was being tested for normal distribution. Any other theoretical distribution can be assumed by using its CDF. Each theoretical distribution has its own critical values, and some examples are: lognormal, exponential, Weibull, extreme value type I and logistic distribution.

3. Null hypothesis follows the true distribution (in this case, N(0, 1)).

See also

External links

References

  1. Anderson, T. W. (1952). "Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes". Annals of Mathematical Statistics. 23: 193–212. Unknown parameter |coauthors= ignored (help); Unknown parameter |author link= ignored (|author-link= suggested) (help)
  2. Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association. 69: 730–737.


Template:WikiDoc Sources