scipy.stats.ks_2samp¶

scipy.stats.
ks_2samp
(data1, data2, alternative='twosided', mode='auto')[source]¶ Compute the KolmogorovSmirnov statistic on 2 samples.
This is a twosided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution. The alternative hypothesis can be either ‘twosided’ (default), ‘less’ or ‘greater’.
 Parameters
 data1, data2sequence of 1D ndarrays
Two arrays of sample observations assumed to be drawn from a continuous distribution, sample sizes can be different.
 alternative{‘twosided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis (see explanation above). Default is ‘twosided’.
 mode{‘auto’, ‘exact’, ‘asymp’}, optional
Defines the method used for calculating the pvalue. Default is ‘auto’.
‘exact’ : use approximation to exact distribution of test statistic
‘asymp’ : use asymptotic distribution of test statistic
‘auto’ : use ‘exact’ for small size arrays, ‘asymp’ for large.
 Returns
 statisticfloat
KS statistic
 pvaluefloat
twotailed pvalue
Notes
This tests whether 2 samples are drawn from the same distribution. Note that, like in the case of the onesample KS test, the distribution is assumed to be continuous.
In the onesided test, the alternative is that the empirical cumulative distribution function F(x) of the data1 variable is “less” or “greater” than the empirical cumulative distribution function G(x) of the data2 variable,
F(x)<=G(x)
, resp.F(x)>=G(x)
.If the KS statistic is small or the pvalue is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.
If the mode is ‘auto’, the computation is exact if the sample sizes are less than 10000. For larger sizes, the computation uses the KolmogorovSmirnov distributions to compute an approximate value.
We generally follow Hodges’ treatment of Drion/Gnedenko/Korolyuk [1].
References
 1(1,2)
Hodges, J.L. Jr., “The Significance Probability of the Smirnov TwoSample Test,” Arkiv fiur Matematik, 3, No. 43 (1958), 46986.
Examples
>>> from scipy import stats >>> np.random.seed(12345678) #fix random seed to get the same result >>> n1 = 200 # size of first sample >>> n2 = 300 # size of second sample
For a different distribution, we can reject the null hypothesis since the pvalue is below 1%:
>>> rvs1 = stats.norm.rvs(size=n1, loc=0., scale=1) >>> rvs2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5) >>> stats.ks_2samp(rvs1, rvs2) (0.20833333333333334, 5.129279597781977e05)
For a slightly different distribution, we cannot reject the null hypothesis at a 10% or lower alpha since the pvalue at 0.144 is higher than 10%
>>> rvs3 = stats.norm.rvs(size=n2, loc=0.01, scale=1.0) >>> stats.ks_2samp(rvs1, rvs3) (0.10333333333333333, 0.14691437867433876)
For an identical distribution, we cannot reject the null hypothesis since the pvalue is high, 41%:
>>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)