scipy.stats.anderson_ksamp¶

scipy.stats.
anderson_ksamp
(samples, midrank=True)[source]¶ The AndersonDarling test for ksamples.
The ksample AndersonDarling test is a modification of the onesample AndersonDarling test. It tests the null hypothesis that ksamples are drawn from the same population without having to specify the distribution function of that population. The critical values depend on the number of samples.
Parameters:  samples : sequence of 1D array_like
Array of sample data in arrays.
 midrank : bool, optional
Type of AndersonDarling test which is computed. Default (True) is the midrank test applicable to continuous and discrete populations. If False, the right side empirical distribution is used.
Returns:  statistic : float
Normalized ksample AndersonDarling test statistic.
 critical_values : array
The critical values for significance levels 25%, 10%, 5%, 2.5%, 1%.
 significance_level : float
An approximate significance level at which the null hypothesis for the provided samples can be rejected.
Raises:  ValueError
If less than 2 samples are provided, a sample is empty, or no distinct observations are in the samples.
Notes
[1] Defines three versions of the ksample AndersonDarling test: one for continuous distributions and two for discrete distributions, in which ties between samples may occur. The default of this routine is to compute the version based on the midrank empirical distribution function. This test is applicable to continuous and discrete data. If midrank is set to False, the right side empirical distribution is used for a test for discrete data. According to [1], the two discrete test statistics differ only slightly if a few collisions due to roundoff errors occur in the test not adjusted for ties between samples.
New in version 0.14.0.
References
[1] (1, 2, 3) Scholz, F. W and Stephens, M. A. (1987), KSample AndersonDarling Tests, Journal of the American Statistical Association, Vol. 82, pp. 918924. Examples
>>> from scipy import stats >>> np.random.seed(314159)
The null hypothesis that the two random samples come from the same distribution can be rejected at the 5% level because the returned test value is greater than the critical value for 5% (1.961) but not at the 2.5% level. The interpolation gives an approximate significance level of 3.1%:
>>> stats.anderson_ksamp([np.random.normal(size=50), ... np.random.normal(loc=0.5, size=30)]) (2.4615796189876105, array([ 0.325, 1.226, 1.961, 2.718, 3.752]), 0.03134990135800783)
The null hypothesis cannot be rejected for three samples from an identical distribution. The approximate pvalue (87%) has to be computed by extrapolation and may not be very accurate:
>>> stats.anderson_ksamp([np.random.normal(size=50), ... np.random.normal(size=30), np.random.normal(size=20)]) (0.73091722665244196, array([ 0.44925884, 1.3052767 , 1.9434184 , 2.57696569, 3.41634856]), 0.8789283903979661)