scipy.stats.

anderson#

scipy.stats.anderson(x, dist='norm')[source]#

Anderson-Darling test for data coming from a particular distribution.

The Anderson-Darling test tests the null hypothesis that a sample is drawn from a population that follows a particular distribution. For the Anderson-Darling test, the critical values depend on which distribution is being tested against. This function works for normal, exponential, logistic, weibull_min, or Gumbel (Extreme Value Type I) distributions.

Parameters:

xarray_like: Array of sample data.
dist{‘norm’, ‘expon’, ‘logistic’, ‘gumbel’, ‘gumbel_l’, ‘gumbel_r’, ‘extreme1’, ‘weibull_min’}, optional: The type of distribution to test against. The default is ‘norm’. The names ‘extreme1’, ‘gumbel_l’ and ‘gumbel’ are synonyms for the same distribution.

Returns:

resultAndersonResult

An object with the following attributes:

statisticfloat: The Anderson-Darling test statistic.
critical_valueslist: The critical values for this distribution.
significance_levellist: The significance levels for the corresponding critical values in percents. The function returns critical values for a differing set of significance levels depending on the distribution that is being tested against.
fit_resultFitResult: An object containing the results of fitting the distribution to the data.

See also

kstest: The Kolmogorov-Smirnov test for goodness-of-fit.

Notes

Critical values provided are for the following significance levels:

normal/exponential: 15%, 10%, 5%, 2.5%, 1%
logistic: 25%, 10%, 5%, 2.5%, 1%, 0.5%
gumbel_l / gumbel_r: 25%, 10%, 5%, 2.5%, 1%
weibull_min: 50%, 25%, 15%, 10%, 5%, 2.5%, 1%, 0.5%

If the returned statistic is larger than these critical values then for the corresponding significance level, the null hypothesis that the data come from the chosen distribution can be rejected. The returned statistic is referred to as ‘A2’ in the references.

For weibull_min, maximum likelihood estimation is known to be challenging. If the test returns successfully, then the first order conditions for a maximum likehood estimate have been verified and the critical values correspond relatively well to the significance levels, provided that the sample is sufficiently large (>10 observations [7]). However, for some data - especially data with no left tail - anderson is likely to result in an error message. In this case, consider performing a custom goodness of fit test using scipy.stats.monte_carlo_test.

References

[1]

https://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm

[2]

Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol. 69, pp. 730-737.

[3]

Stephens, M. A. (1976). Asymptotic Results for Goodness-of-Fit Statistics with Unknown Parameters, Annals of Statistics, Vol. 4, pp. 357-369.

[4]

Stephens, M. A. (1977). Goodness of Fit for the Extreme Value Distribution, Biometrika, Vol. 64, pp. 583-588.

[5]

Stephens, M. A. (1977). Goodness of Fit with Special Reference to Tests for Exponentiality , Technical Report No. 262, Department of Statistics, Stanford University, Stanford, CA.

[6]

Stephens, M. A. (1979). Tests of Fit for the Logistic Distribution Based on the Empirical Distribution Function, Biometrika, Vol. 66, pp. 591-595.

[7]

Richard A. Lockhart and Michael A. Stephens “Estimation and Tests of Fit for the Three-Parameter Weibull Distribution” Journal of the Royal Statistical Society.Series B(Methodological) Vol. 56, No. 3 (1994), pp. 491-500, Table 0.

Examples

Test the null hypothesis that a random sample was drawn from a normal distribution (with unspecified mean and standard deviation).

>>> import numpy as np
>>> from scipy.stats import anderson
>>> rng = np.random.default_rng()
>>> data = rng.random(size=35)
>>> res = anderson(data)
>>> res.statistic
0.8398018749744764
>>> res.critical_values
array([0.527, 0.6  , 0.719, 0.839, 0.998])
>>> res.significance_level
array([15. , 10. ,  5. ,  2.5,  1. ])

The value of the statistic (barely) exceeds the critical value associated with a significance level of 2.5%, so the null hypothesis may be rejected at a significance level of 2.5%, but not at a significance level of 1%.