scipy.stats.ks_1samp#

scipy.stats.ks_1samp(x, cdf, args=(), alternative='two-sided', mode='auto')[source]#

Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.

This test compares the underlying distribution F(x) of a sample against a given continuous distribution G(x). See Notes for a description of the available null and alternative hypotheses.

Parameters
xarray_like

a 1-D array of observations of iid random variables.

cdfcallable

callable used to calculate the cdf.

argstuple, sequence, optional

Distribution parameters, used with cdf.

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below.

mode{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional

Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):

  • ‘auto’ : selects one of the other options.

  • ‘exact’ : uses the exact distribution of test statistic.

  • ‘approx’ : approximates the two-sided probability with twice the one-sided probability

  • ‘asymp’: uses asymptotic distribution of test statistic

Returns
statisticfloat

KS test statistic, either D, D+ or D- (depending on the value of ‘alternative’)

pvaluefloat

One-tailed or two-tailed p-value.

See also

ks_2samp, kstest

Notes

There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.

  • two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.

  • less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.

  • greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.

Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.

Examples

>>> from scipy import stats
>>> rng = np.random.default_rng()
>>> x = np.linspace(-15, 15, 9)
>>> stats.ks_1samp(x, stats.norm.cdf)
(0.44435602715924361, 0.038850142705171065)
>>> stats.ks_1samp(stats.norm.rvs(size=100, random_state=rng),
...                stats.norm.cdf)
KstestResult(statistic=0.165471391799..., pvalue=0.007331283245...)

Test against one-sided alternative hypothesis

Shift distribution to larger values, so that `` CDF(x) < norm.cdf(x)``:

>>> x = stats.norm.rvs(loc=0.2, size=100, random_state=rng)
>>> stats.ks_1samp(x, stats.norm.cdf, alternative='less')
KstestResult(statistic=0.100203351482..., pvalue=0.125544644447...)

Reject null hypothesis in favor of alternative hypothesis: less

>>> stats.ks_1samp(x, stats.norm.cdf, alternative='greater')
KstestResult(statistic=0.018749806388..., pvalue=0.920581859791...)

Reject null hypothesis in favor of alternative hypothesis: greater

>>> stats.ks_1samp(x, stats.norm.cdf)
KstestResult(statistic=0.100203351482..., pvalue=0.250616879765...)

Don’t reject null hypothesis in favor of alternative hypothesis: two-sided

Testing t distributed random variables against normal distribution

With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:

>>> stats.ks_1samp(stats.t.rvs(100,size=100, random_state=rng),
...                stats.norm.cdf)
KstestResult(statistic=0.064273776544..., pvalue=0.778737758305...)

With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:

>>> stats.ks_1samp(stats.t.rvs(3,size=100, random_state=rng),
...                stats.norm.cdf)
KstestResult(statistic=0.128678487493..., pvalue=0.066569081515...)