scipy.stats.anderson#
- scipy.stats.anderson(x, dist='norm')[source]#
Anderson-Darling test for data coming from a particular distribution.
The Anderson-Darling test tests the null hypothesis that a sample is drawn from a population that follows a particular distribution. For the Anderson-Darling test, the critical values depend on which distribution is being tested against. This function works for normal, exponential, logistic, weibull_min, or Gumbel (Extreme Value Type I) distributions.
- Parameters:
- xarray_like
Array of sample data.
- dist{‘norm’, ‘expon’, ‘logistic’, ‘gumbel’, ‘gumbel_l’, ‘gumbel_r’, ‘extreme1’, ‘weibull_min’}, optional
The type of distribution to test against. The default is ‘norm’. The names ‘extreme1’, ‘gumbel_l’ and ‘gumbel’ are synonyms for the same distribution.
- Returns:
- resultAndersonResult
An object with the following attributes:
- statisticfloat
The Anderson-Darling test statistic.
- critical_valueslist
The critical values for this distribution.
- significance_levellist
The significance levels for the corresponding critical values in percents. The function returns critical values for a differing set of significance levels depending on the distribution that is being tested against.
- fit_result
FitResult
An object containing the results of fitting the distribution to the data.
See also
kstest
The Kolmogorov-Smirnov test for goodness-of-fit.
Notes
Critical values provided are for the following significance levels:
- normal/exponential
15%, 10%, 5%, 2.5%, 1%
- logistic
25%, 10%, 5%, 2.5%, 1%, 0.5%
- gumbel_l / gumbel_r
25%, 10%, 5%, 2.5%, 1%
- weibull_min
50%, 25%, 15%, 10%, 5%, 2.5%, 1%, 0.5%
If the returned statistic is larger than these critical values then for the corresponding significance level, the null hypothesis that the data come from the chosen distribution can be rejected. The returned statistic is referred to as ‘A2’ in the references.
For
weibull_min
, maximum likelihood estimation is known to be challenging. If the test returns successfully, then the first order conditions for a maximum likehood estimate have been verified and the critical values correspond relatively well to the significance levels, provided that the sample is sufficiently large (>10 observations [7]). However, for some data - especially data with no left tail -anderson
is likely to result in an error message. In this case, consider performing a custom goodness of fit test usingscipy.stats.monte_carlo_test
.References
[2]Stephens, M. A. (1974). EDF Statistics for Goodness of Fit and Some Comparisons, Journal of the American Statistical Association, Vol. 69, pp. 730-737.
[3]Stephens, M. A. (1976). Asymptotic Results for Goodness-of-Fit Statistics with Unknown Parameters, Annals of Statistics, Vol. 4, pp. 357-369.
[4]Stephens, M. A. (1977). Goodness of Fit for the Extreme Value Distribution, Biometrika, Vol. 64, pp. 583-588.
[5]Stephens, M. A. (1977). Goodness of Fit with Special Reference to Tests for Exponentiality , Technical Report No. 262, Department of Statistics, Stanford University, Stanford, CA.
[6]Stephens, M. A. (1979). Tests of Fit for the Logistic Distribution Based on the Empirical Distribution Function, Biometrika, Vol. 66, pp. 591-595.
[7]Richard A. Lockhart and Michael A. Stephens “Estimation and Tests of Fit for the Three-Parameter Weibull Distribution” Journal of the Royal Statistical Society.Series B(Methodological) Vol. 56, No. 3 (1994), pp. 491-500, Table 0.
Examples
Test the null hypothesis that a random sample was drawn from a normal distribution (with unspecified mean and standard deviation).
>>> import numpy as np >>> from scipy.stats import anderson >>> rng = np.random.default_rng() >>> data = rng.random(size=35) >>> res = anderson(data) >>> res.statistic 0.8398018749744764 >>> res.critical_values array([0.527, 0.6 , 0.719, 0.839, 0.998]) >>> res.significance_level array([15. , 10. , 5. , 2.5, 1. ])
The value of the statistic (barely) exceeds the critical value associated with a significance level of 2.5%, so the null hypothesis may be rejected at a significance level of 2.5%, but not at a significance level of 1%.