scipy.stats.cramervonmises_2samp#

scipy.stats.cramervonmises_2samp(x, y, method='auto')[source]#

Perform the two-sample Cramér-von Mises test for goodness of fit.

This is the two-sample version of the Cramér-von Mises test ([1]): for two independent samples \(X_1, ..., X_n\) and \(Y_1, ..., Y_m\), the null hypothesis is that the samples come from the same (unspecified) continuous distribution.

Parameters
xarray_like

A 1-D array of observed values of the random variables \(X_i\).

yarray_like

A 1-D array of observed values of the random variables \(Y_i\).

method{‘auto’, ‘asymptotic’, ‘exact’}, optional

The method used to compute the p-value, see Notes for details. The default is ‘auto’.

Returns
resobject with attributes
statisticfloat

Cramér-von Mises statistic.

pvaluefloat

The p-value.

Notes

New in version 1.7.0.

The statistic is computed according to equation 9 in [2]. The calculation of the p-value depends on the keyword method:

  • asymptotic: The p-value is approximated by using the limiting distribution of the test statistic.

  • exact: The exact p-value is computed by enumerating all possible combinations of the test statistic, see [2].

The exact calculation will be very slow even for moderate sample sizes as the number of combinations increases rapidly with the size of the samples. If method=='auto', the exact approach is used if both samples contain less than 10 observations, otherwise the asymptotic distribution is used.

If the underlying distribution is not continuous, the p-value is likely to be conservative (Section 6.2 in [3]). When ranking the data to compute the test statistic, midranks are used if there are ties.

References

1

https://en.wikipedia.org/wiki/Cramer-von_Mises_criterion

2(1,2)

Anderson, T.W. (1962). On the distribution of the two-sample Cramer-von-Mises criterion. The Annals of Mathematical Statistics, pp. 1148-1159.

3

Conover, W.J., Practical Nonparametric Statistics, 1971.

Examples

Suppose we wish to test whether two samples generated by scipy.stats.norm.rvs have the same distribution. We choose a significance level of alpha=0.05.

>>> from scipy import stats
>>> rng = np.random.default_rng()
>>> x = stats.norm.rvs(size=100, random_state=rng)
>>> y = stats.norm.rvs(size=70, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y)
>>> res.statistic, res.pvalue
(0.29376470588235293, 0.1412873014573014)

The p-value exceeds our chosen significance level, so we do not reject the null hypothesis that the observed samples are drawn from the same distribution.

For small sample sizes, one can compute the exact p-values:

>>> x = stats.norm.rvs(size=7, random_state=rng)
>>> y = stats.t.rvs(df=2, size=6, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y, method='exact')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.31643356643356646)

The p-value based on the asymptotic distribution is a good approximation even though the sample size is small.

>>> res = stats.cramervonmises_2samp(x, y, method='asymptotic')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.2966041181527128)

Independent of the method, one would not reject the null hypothesis at the chosen significance level in this example.