scipy.stats.ks_2samp¶

scipy.stats.
ks_2samp
(data1, data2)[source]¶ Computes the KolmogorovSmirnov statistic on 2 samples.
This is a twosided test for the null hypothesis that 2 independent samples are drawn from the same continuous distribution.
Parameters: data1, data2 : sequence of 1D ndarrays
two arrays of sample observations assumed to be drawn from a continuous distribution, sample sizes can be different
Returns: statistic : float
KS statistic
pvalue : float
twotailed pvalue
Notes
This tests whether 2 samples are drawn from the same distribution. Note that, like in the case of the onesample KS test, the distribution is assumed to be continuous.
This is the twosided test, onesided tests are not implemented. The test uses the twosided asymptotic KolmogorovSmirnov distribution.
If the KS statistic is small or the pvalue is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.
Examples
>>> from scipy import stats >>> np.random.seed(12345678) #fix random seed to get the same result >>> n1 = 200 # size of first sample >>> n2 = 300 # size of second sample
For a different distribution, we can reject the null hypothesis since the pvalue is below 1%:
>>> rvs1 = stats.norm.rvs(size=n1, loc=0., scale=1) >>> rvs2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5) >>> stats.ks_2samp(rvs1, rvs2) (0.20833333333333337, 4.6674975515806989e005)
For a slightly different distribution, we cannot reject the null hypothesis at a 10% or lower alpha since the pvalue at 0.144 is higher than 10%
>>> rvs3 = stats.norm.rvs(size=n2, loc=0.01, scale=1.0) >>> stats.ks_2samp(rvs1, rvs3) (0.10333333333333333, 0.14498781825751686)
For an identical distribution, we cannot reject the null hypothesis since the pvalue is high, 41%:
>>> rvs4 = stats.norm.rvs(size=n2, loc=0.0, scale=1.0) >>> stats.ks_2samp(rvs1, rvs4) (0.07999999999999996, 0.41126949729859719)