scipy.stats.ks_1samp¶
-
scipy.stats.
ks_1samp
(x, cdf, args=(), alternative='two-sided', mode='auto')[source]¶ Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
This test compares the underlying distribution F(x) of a sample against a given continuous distribution G(x). See Notes for a description of the available null and alternative hypotheses.
- Parameters
- xarray_like
a 1-D array of observations of iid random variables.
- cdfcallable
callable used to calculate the cdf.
- argstuple, sequence, optional
Distribution parameters, used with cdf.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below.
- mode{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional
Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):
‘auto’ : selects one of the other options.
‘exact’ : uses the exact distribution of test statistic.
‘approx’ : approximates the two-sided probability with twice the one-sided probability
‘asymp’: uses asymptotic distribution of test statistic
- Returns
- statisticfloat
KS test statistic, either D, D+ or D- (depending on the value of ‘alternative’)
- pvaluefloat
One-tailed or two-tailed p-value.
Notes
There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.
two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.
less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.
greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.
Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.
Examples
>>> from scipy import stats >>> rng = np.random.default_rng()
>>> x = np.linspace(-15, 15, 9) >>> stats.ks_1samp(x, stats.norm.cdf) (0.44435602715924361, 0.038850142705171065)
>>> stats.ks_1samp(stats.norm.rvs(size=100, random_state=rng), ... stats.norm.cdf) KstestResult(statistic=0.165471391799..., pvalue=0.007331283245...)
Test against one-sided alternative hypothesis
Shift distribution to larger values, so that `` CDF(x) < norm.cdf(x)``:
>>> x = stats.norm.rvs(loc=0.2, size=100, random_state=rng) >>> stats.ks_1samp(x, stats.norm.cdf, alternative='less') KstestResult(statistic=0.100203351482..., pvalue=0.125544644447...)
Reject null hypothesis in favor of alternative hypothesis: less
>>> stats.ks_1samp(x, stats.norm.cdf, alternative='greater') KstestResult(statistic=0.018749806388..., pvalue=0.920581859791...)
Reject null hypothesis in favor of alternative hypothesis: greater
>>> stats.ks_1samp(x, stats.norm.cdf) KstestResult(statistic=0.100203351482..., pvalue=0.250616879765...)
Don’t reject null hypothesis in favor of alternative hypothesis: two-sided
Testing t distributed random variables against normal distribution
With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:
>>> stats.ks_1samp(stats.t.rvs(100,size=100, random_state=rng), ... stats.norm.cdf) KstestResult(statistic=0.064273776544..., pvalue=0.778737758305...)
With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:
>>> stats.ks_1samp(stats.t.rvs(3,size=100, random_state=rng), ... stats.norm.cdf) KstestResult(statistic=0.128678487493..., pvalue=0.066569081515...)