Statistical functions (`scipy.stats`)#

This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.

Statistics is a very large area, and there are topics that are out of scope for SciPy and are covered by other packages. Some of the most important ones are:

statsmodels: regression, linear models, time series analysis, extensions to topics also covered by scipy.stats.
Pandas: tabular data, time series functionality, interfaces to other statistical languages.
PyMC: Bayesian statistical modeling, probabilistic machine learning.
scikit-learn: classification, regression, model selection.
Seaborn: statistical data visualization.
rpy2: Python to R bridge.

Probability distributions#

Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):

`rv_continuous`([momtype, a, b, xtol, ...])	A generic continuous random variable class meant for subclassing.
`rv_discrete`([a, b, name, badvalue, ...])	A generic discrete random variable class meant for subclassing.
`rv_histogram`(histogram, *args[, density])	Generates a distribution given by a histogram.

Continuous distributions#

`alpha`	An alpha continuous random variable.
`anglit`	An anglit continuous random variable.
`arcsine`	An arcsine continuous random variable.
`argus`	Argus distribution
`beta`	A beta continuous random variable.
`betaprime`	A beta prime continuous random variable.
`bradford`	A Bradford continuous random variable.
`burr`	A Burr (Type III) continuous random variable.
`burr12`	A Burr (Type XII) continuous random variable.
`cauchy`	A Cauchy continuous random variable.
`chi`	A chi continuous random variable.
`chi2`	A chi-squared continuous random variable.
`cosine`	A cosine continuous random variable.
`crystalball`	Crystalball distribution
`dgamma`	A double gamma continuous random variable.
`dweibull`	A double Weibull continuous random variable.
`erlang`	An Erlang continuous random variable.
`expon`	An exponential continuous random variable.
`exponnorm`	An exponentially modified Normal continuous random variable.
`exponweib`	An exponentiated Weibull continuous random variable.
`exponpow`	An exponential power continuous random variable.
`f`	An F continuous random variable.
`fatiguelife`	A fatigue-life (Birnbaum-Saunders) continuous random variable.
`fisk`	A Fisk continuous random variable.
`foldcauchy`	A folded Cauchy continuous random variable.
`foldnorm`	A folded normal continuous random variable.
`genlogistic`	A generalized logistic continuous random variable.
`gennorm`	A generalized normal continuous random variable.
`genpareto`	A generalized Pareto continuous random variable.
`genexpon`	A generalized exponential continuous random variable.
`genextreme`	A generalized extreme value continuous random variable.
`gausshyper`	A Gauss hypergeometric continuous random variable.
`gamma`	A gamma continuous random variable.
`gengamma`	A generalized gamma continuous random variable.
`genhalflogistic`	A generalized half-logistic continuous random variable.
`genhyperbolic`	A generalized hyperbolic continuous random variable.
`geninvgauss`	A Generalized Inverse Gaussian continuous random variable.
`gibrat`	A Gibrat continuous random variable.
`gompertz`	A Gompertz (or truncated Gumbel) continuous random variable.
`gumbel_r`	A right-skewed Gumbel continuous random variable.
`gumbel_l`	A left-skewed Gumbel continuous random variable.
`halfcauchy`	A Half-Cauchy continuous random variable.
`halflogistic`	A half-logistic continuous random variable.
`halfnorm`	A half-normal continuous random variable.
`halfgennorm`	The upper half of a generalized normal continuous random variable.
`hypsecant`	A hyperbolic secant continuous random variable.
`invgamma`	An inverted gamma continuous random variable.
`invgauss`	An inverse Gaussian continuous random variable.
`invweibull`	An inverted Weibull continuous random variable.
`johnsonsb`	A Johnson SB continuous random variable.
`johnsonsu`	A Johnson SU continuous random variable.
`kappa4`	Kappa 4 parameter distribution.
`kappa3`	Kappa 3 parameter distribution.
`ksone`	Kolmogorov-Smirnov one-sided test statistic distribution.
`kstwo`	Kolmogorov-Smirnov two-sided test statistic distribution.
`kstwobign`	Limiting distribution of scaled Kolmogorov-Smirnov two-sided test statistic.
`laplace`	A Laplace continuous random variable.
`laplace_asymmetric`	An asymmetric Laplace continuous random variable.
`levy`	A Levy continuous random variable.
`levy_l`	A left-skewed Levy continuous random variable.
`levy_stable`	A Levy-stable continuous random variable.
`logistic`	A logistic (or Sech-squared) continuous random variable.
`loggamma`	A log gamma continuous random variable.
`loglaplace`	A log-Laplace continuous random variable.
`lognorm`	A lognormal continuous random variable.
`loguniform`	A loguniform or reciprocal continuous random variable.
`lomax`	A Lomax (Pareto of the second kind) continuous random variable.
`maxwell`	A Maxwell continuous random variable.
`mielke`	A Mielke Beta-Kappa / Dagum continuous random variable.
`moyal`	A Moyal continuous random variable.
`nakagami`	A Nakagami continuous random variable.
`ncx2`	A non-central chi-squared continuous random variable.
`ncf`	A non-central F distribution continuous random variable.
`nct`	A non-central Student's t continuous random variable.
`norm`	A normal continuous random variable.
`norminvgauss`	A Normal Inverse Gaussian continuous random variable.
`pareto`	A Pareto continuous random variable.
`pearson3`	A pearson type III continuous random variable.
`powerlaw`	A power-function continuous random variable.
`powerlognorm`	A power log-normal continuous random variable.
`powernorm`	A power normal continuous random variable.
`rdist`	An R-distributed (symmetric beta) continuous random variable.
`rayleigh`	A Rayleigh continuous random variable.
`rice`	A Rice continuous random variable.
`recipinvgauss`	A reciprocal inverse Gaussian continuous random variable.
`semicircular`	A semicircular continuous random variable.
`skewcauchy`	A skewed Cauchy random variable.
`skewnorm`	A skew-normal random variable.
`studentized_range`	A studentized range continuous random variable.
`t`	A Student's t continuous random variable.
`trapezoid`	A trapezoidal continuous random variable.
`triang`	A triangular continuous random variable.
`truncexpon`	A truncated exponential continuous random variable.
`truncnorm`	A truncated normal continuous random variable.
`truncweibull_min`	A doubly truncated Weibull minimum continuous random variable.
`tukeylambda`	A Tukey-Lamdba continuous random variable.
`uniform`	A uniform continuous random variable.
`vonmises`	A Von Mises continuous random variable.
`vonmises_line`	A Von Mises continuous random variable.
`wald`	A Wald continuous random variable.
`weibull_min`	Weibull minimum continuous random variable.
`weibull_max`	Weibull maximum continuous random variable.
`wrapcauchy`	A wrapped Cauchy continuous random variable.

Multivariate distributions#

`multivariate_normal`	A multivariate normal random variable.
`matrix_normal`	A matrix normal random variable.
`dirichlet`	A Dirichlet random variable.
`wishart`	A Wishart random variable.
`invwishart`	An inverse Wishart random variable.
`multinomial`	A multinomial random variable.
`special_ortho_group`	A Special Orthogonal matrix (SO(N)) random variable.
`ortho_group`	An Orthogonal matrix (O(N)) random variable.
`unitary_group`	A matrix-valued U(N) random variable.
`random_correlation`	A random correlation matrix.
`multivariate_t`	A multivariate t-distributed random variable.
`multivariate_hypergeom`	A multivariate hypergeometric random variable.

Discrete distributions#

`bernoulli`	A Bernoulli discrete random variable.
`betabinom`	A beta-binomial discrete random variable.
`binom`	A binomial discrete random variable.
`boltzmann`	A Boltzmann (Truncated Discrete Exponential) random variable.
`dlaplace`	A Laplacian discrete random variable.
`geom`	A geometric discrete random variable.
`hypergeom`	A hypergeometric discrete random variable.
`logser`	A Logarithmic (Log-Series, Series) discrete random variable.
`nbinom`	A negative binomial discrete random variable.
`nchypergeom_fisher`	A Fisher's noncentral hypergeometric discrete random variable.
`nchypergeom_wallenius`	A Wallenius' noncentral hypergeometric discrete random variable.
`nhypergeom`	A negative hypergeometric discrete random variable.
`planck`	A Planck discrete exponential random variable.
`poisson`	A Poisson discrete random variable.
`randint`	A uniform discrete random variable.
`skellam`	A Skellam discrete random variable.
`yulesimon`	A Yule-Simon discrete random variable.
`zipf`	A Zipf (Zeta) discrete random variable.
`zipfian`	A Zipfian discrete random variable.

An overview of statistical functions is given below. Many of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

Summary statistics#

`describe`(a[, axis, ddof, bias, nan_policy])	Compute several descriptive statistics of the passed array.
`gmean`(a[, axis, dtype, weights, nan_policy, ...])	Compute the weighted geometric mean along the specified axis.
`hmean`(a[, axis, dtype, weights, nan_policy, ...])	Calculate the weighted harmonic mean along the specified axis.
`pmean`(a, p, *[, axis, dtype, weights, ...])	Calculate the weighted power mean along the specified axis.
`kurtosis`(a[, axis, fisher, bias, ...])	Compute the kurtosis (Fisher or Pearson) of a dataset.
`mode`(a[, axis, nan_policy, keepdims])	Return an array of the modal (most common) value in the passed array.
`moment`(a[, moment, axis, nan_policy, keepdims])	Calculate the nth moment about the mean for a sample.
`skew`(a[, axis, bias, nan_policy, keepdims])	Compute the sample skewness of a data set.
`kstat`(data[, n, axis, nan_policy, keepdims])	Return the nth k-statistic (1<=n<=4 so far).
`kstatvar`(data[, n, axis, nan_policy, keepdims])	Return an unbiased estimator of the variance of the k-statistic.
`tmean`(a[, limits, inclusive, axis])	Compute the trimmed mean.
`tvar`(a[, limits, inclusive, axis, ddof])	Compute the trimmed variance.
`tmin`(a[, lowerlimit, axis, inclusive, ...])	Compute the trimmed minimum.
`tmax`(a[, upperlimit, axis, inclusive, ...])	Compute the trimmed maximum.
`tstd`(a[, limits, inclusive, axis, ddof])	Compute the trimmed sample standard deviation.
`tsem`(a[, limits, inclusive, axis, ddof])	Compute the trimmed standard error of the mean.
`variation`(a[, axis, nan_policy, ddof, keepdims])	Compute the coefficient of variation.
`find_repeats`(arr)	Find repeats and repeat counts.
`trim_mean`(a, proportiontocut[, axis])	Return mean of array after trimming distribution from both tails.
`gstd`(a[, axis, ddof])	Calculate the geometric standard deviation of an array.
`iqr`(x[, axis, rng, scale, nan_policy, ...])	Compute the interquartile range of the data along the specified axis.
`sem`(a[, axis, ddof, nan_policy])	Compute standard error of the mean.
`bayes_mvs`(data[, alpha])	Bayesian confidence intervals for the mean, var, and std.
`mvsdist`(data)	'Frozen' distributions for mean, variance, and standard deviation of data.
`entropy`(pk[, qk, base, axis])	Calculate the entropy of a distribution for given probability values.
`differential_entropy`(values, *[, ...])	Given a sample of a distribution, estimate the differential entropy.
`median_abs_deviation`(x[, axis, center, ...])	Compute the median absolute deviation of the data along the given axis.

Frequency statistics#

`cumfreq`(a[, numbins, defaultreallimits, weights])	Return a cumulative frequency histogram, using the histogram function.
`percentileofscore`(a, score[, kind, nan_policy])	Compute the percentile rank of a score relative to a list of scores.
`scoreatpercentile`(a, per[, limit, ...])	Calculate the score at a given percentile of the input sequence.
`relfreq`(a[, numbins, defaultreallimits, weights])	Return a relative frequency histogram, using the histogram function.

`binned_statistic`(x, values[, statistic, ...])	Compute a binned statistic for one or more sets of data.
`binned_statistic_2d`(x, y, values[, ...])	Compute a bidimensional binned statistic for one or more sets of data.
`binned_statistic_dd`(sample, values[, ...])	Compute a multidimensional binned statistic for a set of data.

Correlation functions#

`f_oneway`(*samples[, axis])	Perform one-way ANOVA.
`alexandergovern`(*samples[, nan_policy])	Performs the Alexander Govern test.
`pearsonr`(x, y, *[, alternative])	Pearson correlation coefficient and p-value for testing non-correlation.
`spearmanr`(a[, b, axis, nan_policy, alternative])	Calculate a Spearman correlation coefficient with associated p-value.
`pointbiserialr`(x, y)	Calculate a point biserial correlation coefficient and its p-value.
`kendalltau`(x, y[, initial_lexsort, ...])	Calculate Kendall's tau, a correlation measure for ordinal data.
`weightedtau`(x, y[, rank, weigher, additive])	Compute a weighted version of Kendall's \(\tau\).
`somersd`(x[, y, alternative])	Calculates Somers' D, an asymmetric measure of ordinal association.
`linregress`(x[, y, alternative])	Calculate a linear least-squares regression for two sets of measurements.
`siegelslopes`(y[, x, method])	Computes the Siegel estimator for a set of points (x, y).
`theilslopes`(y[, x, alpha, method])	Computes the Theil-Sen estimator for a set of points (x, y).
`multiscale_graphcorr`(x, y[, ...])	Computes the Multiscale Graph Correlation (MGC) test statistic.

Statistical tests#

`ttest_1samp`(a, popmean[, axis, nan_policy, ...])	Calculate the T-test for the mean of ONE group of scores.
`ttest_ind`(a, b[, axis, equal_var, ...])	Calculate the T-test for the means of two independent samples of scores.
`ttest_ind_from_stats`(mean1, std1, nobs1, ...)	T-test for means of two independent samples from descriptive statistics.
`ttest_rel`(a, b[, axis, nan_policy, alternative])	Calculate the t-test on TWO RELATED samples of scores, a and b.
`chisquare`(f_obs[, f_exp, ddof, axis])	Calculate a one-way chi-square test.
`cramervonmises`(rvs, cdf[, args])	Perform the one-sample Cramér-von Mises test for goodness of fit.
`cramervonmises_2samp`(x, y[, method])	Perform the two-sample Cramér-von Mises test for goodness of fit.
`power_divergence`(f_obs[, f_exp, ddof, axis, ...])	Cressie-Read power divergence statistic and goodness of fit test.
`kstest`(rvs, cdf[, args, N, alternative, method])	Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit.
`ks_1samp`(x, cdf[, args, alternative, method])	Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
`ks_2samp`(data1, data2[, alternative, method])	Performs the two-sample Kolmogorov-Smirnov test for goodness of fit.
`epps_singleton_2samp`(x, y[, t])	Compute the Epps-Singleton (ES) test statistic.
`mannwhitneyu`(x, y[, use_continuity, ...])	Perform the Mann-Whitney U rank test on two independent samples.
`tiecorrect`(rankvals)	Tie correction factor for Mann-Whitney U and Kruskal-Wallis H tests.
`rankdata`(a[, method, axis])	Assign ranks to data, dealing with ties appropriately.
`ranksums`(x, y[, alternative, axis, ...])	Compute the Wilcoxon rank-sum statistic for two samples.
`wilcoxon`(x[, y, zero_method, correction, ...])	Calculate the Wilcoxon signed-rank test.
`kruskal`(*samples[, nan_policy, axis, keepdims])	Compute the Kruskal-Wallis H-test for independent samples.
`friedmanchisquare`(*samples)	Compute the Friedman test for repeated samples.
`brunnermunzel`(x, y[, alternative, ...])	Compute the Brunner-Munzel test on samples x and y.
`combine_pvalues`(pvalues[, method, weights])	Combine p-values from independent tests that bear upon the same hypothesis.
`jarque_bera`(x)	Perform the Jarque-Bera goodness of fit test on sample data.
`page_trend_test`(data[, ranked, ...])	Perform Page's Test, a measure of trend in observations between treatments.
`tukey_hsd`(*args)	Perform Tukey's HSD test for equality of means over multiple treatments.

`ansari`(x, y[, alternative])	Perform the Ansari-Bradley test for equal scale parameters.
`bartlett`(*samples)	Perform Bartlett's test for equal variances.
`levene`(*samples[, center, proportiontocut])	Perform Levene test for equal variances.
`shapiro`(x)	Perform the Shapiro-Wilk test for normality.
`anderson`(x[, dist])	Anderson-Darling test for data coming from a particular distribution.
`anderson_ksamp`(samples[, midrank])	The Anderson-Darling test for k-samples.
`binom_test`(x[, n, p, alternative])	Perform a test that the probability of success is p.
`binomtest`(k, n[, p, alternative])	Perform a test that the probability of success is p.
`fligner`(*samples[, center, proportiontocut])	Perform Fligner-Killeen test for equality of variance.
`median_test`(*samples[, ties, correction, ...])	Perform a Mood's median test.
`mood`(x, y[, axis, alternative])	Perform Mood's test for equal scale parameters.
`skewtest`(a[, axis, nan_policy, alternative])	Test whether the skew is different from the normal distribution.
`kurtosistest`(a[, axis, nan_policy, alternative])	Test whether a dataset has normal kurtosis.
`normaltest`(a[, axis, nan_policy])	Test whether a sample differs from a normal distribution.

Quasi-Monte Carlo#

Quasi-Monte Carlo submodule (scipy.stats.qmc)
- Quasi-Monte Carlo
  - Engines
  - Helpers
- Introduction to Quasi-Monte Carlo
  - References

Resampling Methods#

`bootstrap`(data, statistic, *[, n_resamples, ...])	Compute a two-sided bootstrap confidence interval of a statistic.
`permutation_test`(data, statistic, *[, ...])	Performs a permutation test of a given statistic on provided data.
`monte_carlo_test`(sample, rvs, statistic, *)	Monte Carlo test that a sample is drawn from a given distribution.

Masked statistics functions#

Statistical functions for masked arrays (scipy.stats.mstats)

Other statistical functionality#

Transformations#

`boxcox`(x[, lmbda, alpha, optimizer])	Return a dataset transformed by a Box-Cox power transformation.
`boxcox_normmax`(x[, brack, method, optimizer])	Compute optimal Box-Cox transform parameter for input data.
`boxcox_llf`(lmb, data)	The boxcox log-likelihood function.
`yeojohnson`(x[, lmbda])	Return a dataset transformed by a Yeo-Johnson power transformation.
`yeojohnson_normmax`(x[, brack])	Compute optimal Yeo-Johnson transform parameter.
`yeojohnson_llf`(lmb, data)	The yeojohnson log-likelihood function.
`obrientransform`(*samples)	Compute the O'Brien transform on input data (any number of arrays).
`sigmaclip`(a[, low, high])	Perform iterative sigma-clipping of array elements.
`trimboth`(a, proportiontocut[, axis])	Slice off a proportion of items from both ends of an array.
`trim1`(a, proportiontocut[, tail, axis])	Slice off a proportion from ONE end of the passed array distribution.
`zmap`(scores, compare[, axis, ddof, nan_policy])	Calculate the relative z-scores.
`zscore`(a[, axis, ddof, nan_policy])	Compute the z score.
`gzscore`(a, *[, axis, ddof, nan_policy])	Compute the geometric standard score.

Statistical distances#

`wasserstein_distance`(u_values, v_values[, ...])	Compute the first Wasserstein distance between two 1D distributions.
`energy_distance`(u_values, v_values[, ...])	Compute the energy distance between two 1D distributions.

Sampling#

Random Number Generators (scipy.stats.sampling)
- Generators Wrapped

Random variate generation / CDF Inversion#

rvs_ratio_uniforms(pdf, umax, vmin, vmax[, ...])

Generate random samples from a probability density function using the ratio-of-uniforms method.

Distribution Fitting#

fit(dist, data[, bounds, guess, optimizer])

Fit a discrete or continuous distribution to data

Circular statistical functions#

`circmean`(samples[, high, low, axis, nan_policy])	Compute the circular mean for samples in a range.
`circvar`(samples[, high, low, axis, nan_policy])	Compute the circular variance for samples assumed to be in a range.
`circstd`(samples[, high, low, axis, ...])	Compute the circular standard deviation for samples assumed to be in the range [low to high].

Contingency table functions#

`chi2_contingency`(observed[, correction, lambda_])	Chi-square test of independence of variables in a contingency table.
`contingency.crosstab`(*args[, levels, sparse])	Return table of counts for each possible unique combination in `*args`.
`contingency.expected_freq`(observed)	Compute the expected frequencies from a contingency table.
`contingency.margins`(a)	Return a list of the marginal sums of the array a.
`contingency.relative_risk`(exposed_cases, ...)	Compute the relative risk (also known as the risk ratio).
`contingency.association`(observed[, method, ...])	Calculates degree of association between two nominal variables.
`fisher_exact`(table[, alternative])	Perform a Fisher exact test on a 2x2 contingency table.
`barnard_exact`(table[, alternative, pooled, n])	Perform a Barnard exact test on a 2x2 contingency table.
`boschloo_exact`(table[, alternative, n])	Perform Boschloo's exact test on a 2x2 contingency table.

Plot-tests#

`ppcc_max`(x[, brack, dist])	Calculate the shape parameter that maximizes the PPCC.
`ppcc_plot`(x, a, b[, dist, plot, N])	Calculate and optionally plot probability plot correlation coefficient.
`probplot`(x[, sparams, dist, fit, plot, rvalue])	Calculate quantiles for a probability plot, and optionally show the plot.
`boxcox_normplot`(x, la, lb[, plot, N])	Compute parameters for a Box-Cox normality plot, optionally show it.
`yeojohnson_normplot`(x, la, lb[, plot, N])	Compute parameters for a Yeo-Johnson normality plot, optionally show it.

Univariate and multivariate kernel density estimation#

gaussian_kde(dataset[, bw_method, weights])

Representation of a kernel-density estimate using Gaussian kernels.

Warnings / Errors used in `scipy.stats`#

`DegenerateDataWarning`([msg])	Warns when data is degenerate and results may not be reliable.
`ConstantInputWarning`([msg])	Warns when all values in data are exactly equal.
`NearConstantInputWarning`([msg])	Warns when all values in data are nearly equal.
`FitError`([msg])	Represents an error condition when fitting a distribution to data.

Statistical functions (scipy.stats)#