scipy.stats.ks_1samp#
- scipy.stats.ks_1samp(x, cdf, args=(), alternative='two-sided', method='auto')[source]#
Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
This test compares the underlying distribution F(x) of a sample against a given continuous distribution G(x). See Notes for a description of the available null and alternative hypotheses.
- Parameters
- xarray_like
a 1-D array of observations of iid random variables.
- cdfcallable
callable used to calculate the cdf.
- argstuple, sequence, optional
Distribution parameters, used with cdf.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below.
- method{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional
Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):
‘auto’ : selects one of the other options.
‘exact’ : uses the exact distribution of test statistic.
‘approx’ : approximates the two-sided probability with twice the one-sided probability
‘asymp’: uses asymptotic distribution of test statistic
- Returns
- statisticfloat
KS test statistic, either D, D+ or D- (depending on the value of ‘alternative’)
- pvaluefloat
One-tailed or two-tailed p-value.
Notes
There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.
two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.
less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.
greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.
Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.
Examples
Suppose we wish to test the null hypothesis that a sample is distributed according to the standard normal. We choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05.
When testing uniformly distributed data, we would expect the null hypothesis to be rejected.
>>> from scipy import stats >>> rng = np.random.default_rng() >>> stats.ks_1samp(stats.uniform.rvs(size=100, random_state=rng), ... stats.norm.cdf) KstestResult(statistic=0.5001899973268688, pvalue=1.1616392184763533e-23)
Indeed, the p-value is lower than our threshold of 0.05, so we reject the null hypothesis in favor of the default “two-sided” alternative: the data are not distributed according to the standard normal.
When testing random variates from the standard normal distribution, we expect the data to be consistent with the null hypothesis most of the time.
>>> x = stats.norm.rvs(size=100, random_state=rng) >>> stats.ks_1samp(x, stats.norm.cdf) KstestResult(statistic=0.05345882212970396, pvalue=0.9227159037744717)
As expected, the p-value of 0.92 is not below our threshold of 0.05, so we cannot reject the null hypothesis.
Suppose, however, that the random variates are distributed according to a normal distribution that is shifted toward greater values. In this case, the cumulative density function (CDF) of the underlying distribution tends to be less than the CDF of the standard normal. Therefore, we would expect the null hypothesis to be rejected with
alternative='less'
:>>> x = stats.norm.rvs(size=100, loc=0.5, random_state=rng) >>> stats.ks_1samp(x, stats.norm.cdf, alternative='less') KstestResult(statistic=0.17482387821055168, pvalue=0.001913921057766743)
and indeed, with p-value smaller than our threshold, we reject the null hypothesis in favor of the alternative.