SciPy

scipy.special.smirnov

scipy.special.smirnov(n, d) = <ufunc 'smirnov'>

Kolmogorov-Smirnov complementary cumulative distribution function

Returns the exact Kolmogorov-Smirnov complementary cumulative distribution function,(aka the Survival Function) of Dn+ (or Dn-) for a one-sided test of equality between an empirical and a theoretical distribution. It is equal to the probability that the maximum difference between a theoretical distribution and an empirical one based on n samples is greater than d.

Parameters:
n : int

Number of samples

d : float array_like

Deviation between the Empirical CDF (ECDF) and the target CDF.

Returns:
float

The value(s) of smirnov(n, d), Prob(Dn+ >= d) (Also Prob(Dn- >= d))

See also

smirnovi
The Inverse Survival Function for the distribution
scipy.stats.ksone
Provides the functionality as a continuous distribution

kolmogorov, kolmogi

Notes

smirnov is used by stats.kstest in the application of the Kolmogorov-Smirnov Goodness of Fit test. For historial reasons this function is exposed in scpy.special, but the recommended way to achieve the most accurate CDF/SF/PDF/PPF/ISF computations is to use the stats.ksone distrubution.

Examples

>>> from scipy.special import smirnov

Show the probability of a gap at least as big as 0, 0.5 and 1.0 for a sample of size 5

>>> smirnov(5, [0, 0.5, 1.0])
array([ 1.   ,  0.056,  0.   ])

Compare a sample of size 5 drawn from a source N(0.5, 1) distribution against a target N(0, 1) CDF.

>>> from scipy.stats import norm
>>> n = 5
>>> gendist = norm(0.5, 1)       # Normal distribution, mean 0.5, stddev 1
>>> np.random.seed(seed=233423)  # Set the seed for reproducibility
>>> x = np.sort(gendist.rvs(size=n))
>>> x
array([-0.20946287,  0.71688765,  0.95164151,  1.44590852,  3.08880533])
>>> target = norm(0, 1)
>>> cdfs = target.cdf(x)
>>> cdfs
array([ 0.41704346,  0.76327829,  0.82936059,  0.92589857,  0.99899518])
# Construct the Empirical CDF and the K-S statistics (Dn+, Dn-, Dn)
>>> ecdfs = np.arange(n+1, dtype=float)/n
>>> cols = np.column_stack([x, ecdfs[1:], cdfs, cdfs - ecdfs[:n], ecdfs[1:] - cdfs])
>>> np.set_printoptions(precision=3)
>>> cols
array([[ -2.095e-01,   2.000e-01,   4.170e-01,   4.170e-01,  -2.170e-01],
       [  7.169e-01,   4.000e-01,   7.633e-01,   5.633e-01,  -3.633e-01],
       [  9.516e-01,   6.000e-01,   8.294e-01,   4.294e-01,  -2.294e-01],
       [  1.446e+00,   8.000e-01,   9.259e-01,   3.259e-01,  -1.259e-01],
       [  3.089e+00,   1.000e+00,   9.990e-01,   1.990e-01,   1.005e-03]])
>>> gaps = cols[:, -2:]
>>> Dnpm = np.max(gaps, axis=0)
>>> print('Dn-=%f, Dn+=%f' % (Dnpm[0], Dnpm[1]))
Dn-=0.563278, Dn+=0.001005
>>> probs = smirnov(n, Dnpm)
>>> print(chr(10).join(['For a sample of size %d drawn from a N(0, 1) distribution:' % n,
...      ' Smirnov n=%d: Prob(Dn- >= %f) = %.4f' % (n, Dnpm[0], probs[0]),
...      ' Smirnov n=%d: Prob(Dn+ >= %f) = %.4f' % (n, Dnpm[1], probs[1])]))
For a sample of size 5 drawn from a N(0, 1) distribution:
 Smirnov n=5: Prob(Dn- >= 0.563278) = 0.0250
 Smirnov n=5: Prob(Dn+ >= 0.001005) = 0.9990

Plot the Empirical CDF against the target N(0, 1) CDF

>>> import matplotlib.pyplot as plt
>>> plt.step(np.concatenate([[-3], x]), ecdfs, where='post', label='Empirical CDF')
>>> x3 = np.linspace(-3, 3, 100)
>>> plt.plot(x3, target.cdf(x3), label='CDF for N(0, 1)')
>>> plt.ylim([0, 1]); plt.grid(True); plt.legend();
# Add vertical lines marking Dn+ and Dn-
>>> iminus, iplus = np.argmax(gaps, axis=0)
>>> plt.vlines([x[iminus]], ecdfs[iminus], cdfs[iminus], color='r', linestyle='dashed', lw=4)
>>> plt.vlines([x[iplus]], cdfs[iplus], ecdfs[iplus+1], color='m', linestyle='dashed', lw=4)
>>> plt.show()
../_images/scipy-special-smirnov-1.png

Previous topic

scipy.special.chndtrix

Next topic

scipy.special.smirnovi