SciPy

scipy.stats.kstest

scipy.stats.kstest(rvs, cdf, args=(), N=20, alternative='two-sided', mode='auto')[source]

Performs the (one sample or two samples) Kolmogorov-Smirnov test for goodness of fit.

The one-sample test performs a test of the distribution F(x) of an observed random variable against a given distribution G(x). Under the null hypothesis, the two distributions are identical, F(x)=G(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions. The two-sample test tests whether the two independent samples are drawn from the same continuous distribution.

Parameters
rvsstr, array_like, or callable

If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument size. If a string, it should be the name of a distribution in scipy.stats, which will be used to generate random variables.

cdfstr, array_like or callable

If array_like, it should be a 1-D array of observations of random variables, and the two-sample test is performed (and rvs must be array_like) If a callable, that callable is used to calculate the cdf. If a string, it should be the name of a distribution in scipy.stats, which will be used as the cdf function.

argstuple, sequence, optional

Distribution parameters, used if rvs or cdf are strings or callables.

Nint, optional

Sample size if rvs is string or callable. Default is 20.

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):

  • ‘two-sided’

  • ‘less’: one-sided, see explanation in Notes

  • ‘greater’: one-sided, see explanation in Notes

mode{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional

Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):

  • ‘auto’ : selects one of the other options.

  • ‘exact’ : uses the exact distribution of test statistic.

  • ‘approx’ : approximates the two-sided probability with twice the one-sided probability

  • ‘asymp’: uses asymptotic distribution of test statistic

Returns
statisticfloat

KS test statistic, either D, D+ or D-.

pvaluefloat

One-tailed or two-tailed p-value.

See also

ks_2samp

Notes

In the one-sided test, the alternative is that the empirical cumulative distribution function of the random variable is “less” or “greater” than the cumulative distribution function G(x) of the hypothesis, F(x)<=G(x), resp. F(x)>=G(x).

Examples

>>> from scipy import stats
>>> x = np.linspace(-15, 15, 9)
>>> stats.kstest(x, 'norm')
(0.44435602715924361, 0.038850142705171065)
>>> np.random.seed(987654321) # set random seed to get the same result
>>> stats.kstest(stats.norm.rvs(size=100), stats.norm.cdf)
(0.058352892479417884, 0.8653960860778898)

The above lines are equivalent to:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.norm.rvs, 'norm', N=100)
(0.058352892479417884, 0.8653960860778898)

Test against one-sided alternative hypothesis

Shift distribution to larger values, so that CDF(x) < norm.cdf(x):

>>> np.random.seed(987654321)
>>> x = stats.norm.rvs(loc=0.2, size=100)
>>> stats.kstest(x, 'norm', alternative='less')
(0.12464329735846891, 0.040989164077641749)

Reject equal distribution against alternative hypothesis: less

>>> stats.kstest(x, 'norm', alternative='greater')
(0.0072115233216311081, 0.98531158590396395)

Don’t reject equal distribution against alternative hypothesis: greater

>>> stats.kstest(x, 'norm')
(0.12464329735846891, 0.08197335233541582)

Testing t distributed random variables against normal distribution

With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(100, size=100), 'norm')
(0.072018929165471257, 0.6505883498379312)

With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(3, size=100), 'norm')
(0.131016895759829, 0.058826222555312224)

Previous topic

scipy.stats.power_divergence

Next topic

scipy.stats.ks_1samp