scipy.stats.kstest#

scipy.stats.kstest(rvs, cdf, args=(), N=20, alternative='two-sided', mode='auto')[source]#

Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit.

The one-sample test compares the underlying distribution F(x) of a sample against a given distribution G(x). The two-sample test compares the underlying distributions of two independent samples. Both tests are valid only for continuous distributions.

Parameters
rvsstr, array_like, or callable

If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument size. If a string, it should be the name of a distribution in scipy.stats, which will be used to generate random variables.

cdfstr, array_like or callable

If array_like, it should be a 1-D array of observations of random variables, and the two-sample test is performed (and rvs must be array_like). If a callable, that callable is used to calculate the cdf. If a string, it should be the name of a distribution in scipy.stats, which will be used as the cdf function.

argstuple, sequence, optional

Distribution parameters, used if rvs or cdf are strings or callables.

Nint, optional

Sample size if rvs is string or callable. Default is 20.

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below.

mode{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional

Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):

  • ‘auto’ : selects one of the other options.

  • ‘exact’ : uses the exact distribution of test statistic.

  • ‘approx’ : approximates the two-sided probability with twice the one-sided probability

  • ‘asymp’: uses asymptotic distribution of test statistic

Returns
statisticfloat

KS test statistic, either D, D+ or D-.

pvaluefloat

One-tailed or two-tailed p-value.

See also

ks_2samp

Notes

There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.

  • two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.

  • less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.

  • greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.

Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.

Examples

>>> from scipy import stats
>>> rng = np.random.default_rng()
>>> x = np.linspace(-15, 15, 9)
>>> stats.kstest(x, 'norm')
KstestResult(statistic=0.444356027159..., pvalue=0.038850140086...)
>>> stats.kstest(stats.norm.rvs(size=100, random_state=rng), stats.norm.cdf)
KstestResult(statistic=0.165471391799..., pvalue=0.007331283245...)

The above lines are equivalent to:

>>> stats.kstest(stats.norm.rvs, 'norm', N=100)
KstestResult(statistic=0.113810164200..., pvalue=0.138690052319...)  # may vary

Test against one-sided alternative hypothesis

Shift distribution to larger values, so that CDF(x) < norm.cdf(x):

>>> x = stats.norm.rvs(loc=0.2, size=100, random_state=rng)
>>> stats.kstest(x, 'norm', alternative='less')
KstestResult(statistic=0.1002033514..., pvalue=0.1255446444...)

Reject null hypothesis in favor of alternative hypothesis: less

>>> stats.kstest(x, 'norm', alternative='greater')
KstestResult(statistic=0.018749806388..., pvalue=0.920581859791...)

Don’t reject null hypothesis in favor of alternative hypothesis: greater

>>> stats.kstest(x, 'norm')
KstestResult(statistic=0.100203351482..., pvalue=0.250616879765...)

Testing t distributed random variables against normal distribution

With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:

>>> stats.kstest(stats.t.rvs(100, size=100, random_state=rng), 'norm')
KstestResult(statistic=0.064273776544..., pvalue=0.778737758305...)

With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:

>>> stats.kstest(stats.t.rvs(3, size=100, random_state=rng), 'norm')
KstestResult(statistic=0.128678487493..., pvalue=0.066569081515...)