SciPy

KStwo Distribution

This is the limiting distribution of the normalized maximum absolute differences between an empirical distribution function, computed from \(n\) samples or observations, and a comparison (or target) cumulative distribution function. (ksone is the distribution of the unnormalized positive differences, \(D_n^+\).)

Writing \(D_n = \sup_t \left|F_{empirical,n}(t) - F_{target}(t)-\right|\), the normalization factor is \(\sqrt{n}\), and kstwobign is the limiting distribution of the \(\sqrt{n} D_n\) values as \(n\rightarrow\infty\).

Note that \(D_n=\max(D_n^+, D_n^-)\), but \(D_n^+\) and \(D_n^-\) are not independent.

kstwobign can also be used with the differences between two empirical distribution functions, for sets of observations with \(m\) and \(n\) samples respectively, where \(m\) and \(n\) are “big”. Writing \(D_{m,n} = \sup_t \left|F_{1,m}(t)-F_{2,n}(t)\right|\), where \(F_{1,m}\) and \(F_{2,n}\) are the two empirical distribution functions, then kstwobign is also the limiting distribution of the \(\sqrt{\left(\frac{mn}{m+n}\right)D_{m,n}}\) values, as \(m,n\rightarrow\infty\).

There are no shape parameters, and the support is \(x\in\left[0,\infty\right)\).

\begin{eqnarray*} F\left(x\right) & = & 1 - 2 \sum_{k=1}^{\infty} (-1)^{k-1} e^{-2k^2 x^2}\\ & = & \frac{\sqrt{2\pi}}{x} \sum_{k=1}^{\infty} e^{-(2k-1)^2 \pi^2/(8x^2)}\\ & = & 1 - \textrm{scipy.special.kolmogorov}(n, x) \\ f\left(x\right) & = & 8x \sum_{k=1}^{\infty} (-1)^{k-1} k^2 e^{-2k^2 x^2} \end{eqnarray*}

References

  • “Kolmogorov-Smirnov test”, Wikipedia https://en.wikipedia.org/wiki/Kolmogorov-Smirnov_test

  • Kolmogoroff, A. “Confidence Limits for an Unknown Distribution Function.”” Ann. Math. Statist. 12 (1941), no. 4, 461–463.

  • Feller, W. “On the Kolmogorov-Smirnov Limit Theorems for Empirical Distributions.” Ann. Math. Statist. 19 (1948), no. 2, 177–189. and “Errata” Ann. Math. Statist. 21 (1950), no. 2, 301–302.

Implementation: scipy.stats.kstwobign