scipy.stats.kstest¶
-
scipy.stats.
kstest
(rvs, cdf, args=(), N=20, alternative='two-sided', mode='auto')[source]¶ Performs the (one sample or two samples) Kolmogorov-Smirnov test for goodness of fit.
The one-sample test performs a test of the distribution F(x) of an observed random variable against a given distribution G(x). Under the null hypothesis, the two distributions are identical, F(x)=G(x). The alternative hypothesis can be either ‘two-sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions. The two-sample test tests whether the two independent samples are drawn from the same continuous distribution.
- Parameters
- rvsstr, array_like, or callable
If an array, it should be a 1-D array of observations of random variables. If a callable, it should be a function to generate random variables; it is required to have a keyword argument size. If a string, it should be the name of a distribution in
scipy.stats
, which will be used to generate random variables.- cdfstr, array_like or callable
If array_like, it should be a 1-D array of observations of random variables, and the two-sample test is performed (and rvs must be array_like) If a callable, that callable is used to calculate the cdf. If a string, it should be the name of a distribution in
scipy.stats
, which will be used as the cdf function.- argstuple, sequence, optional
Distribution parameters, used if rvs or cdf are strings or callables.
- Nint, optional
Sample size if rvs is string or callable. Default is 20.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
‘two-sided’
‘less’: one-sided, see explanation in Notes
‘greater’: one-sided, see explanation in Notes
- mode{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optional
Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):
‘auto’ : selects one of the other options.
‘exact’ : uses the exact distribution of test statistic.
‘approx’ : approximates the two-sided probability with twice the one-sided probability
‘asymp’: uses asymptotic distribution of test statistic
- Returns
- statisticfloat
KS test statistic, either D, D+ or D-.
- pvaluefloat
One-tailed or two-tailed p-value.
See also
Notes
In the one-sided test, the alternative is that the empirical cumulative distribution function of the random variable is “less” or “greater” than the cumulative distribution function G(x) of the hypothesis,
F(x)<=G(x)
, resp.F(x)>=G(x)
.Examples
>>> from scipy import stats
>>> x = np.linspace(-15, 15, 9) >>> stats.kstest(x, 'norm') (0.44435602715924361, 0.038850142705171065)
>>> np.random.seed(987654321) # set random seed to get the same result >>> stats.kstest(stats.norm.rvs(size=100), stats.norm.cdf) (0.058352892479417884, 0.8653960860778898)
The above lines are equivalent to:
>>> np.random.seed(987654321) >>> stats.kstest(stats.norm.rvs, 'norm', N=100) (0.058352892479417884, 0.8653960860778898)
Test against one-sided alternative hypothesis
Shift distribution to larger values, so that
CDF(x) < norm.cdf(x)
:>>> np.random.seed(987654321) >>> x = stats.norm.rvs(loc=0.2, size=100) >>> stats.kstest(x, 'norm', alternative='less') (0.12464329735846891, 0.040989164077641749)
Reject equal distribution against alternative hypothesis: less
>>> stats.kstest(x, 'norm', alternative='greater') (0.0072115233216311081, 0.98531158590396395)
Don’t reject equal distribution against alternative hypothesis: greater
>>> stats.kstest(x, 'norm') (0.12464329735846891, 0.08197335233541582)
Testing t distributed random variables against normal distribution
With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:
>>> np.random.seed(987654321) >>> stats.kstest(stats.t.rvs(100, size=100), 'norm') (0.072018929165471257, 0.6505883498379312)
With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:
>>> np.random.seed(987654321) >>> stats.kstest(stats.t.rvs(3, size=100), 'norm') (0.131016895759829, 0.058826222555312224)