Statistical functions (scipy.stats)#
This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.
Statistics is a very large area, and there are topics that are out of scope for SciPy and are covered by other packages. Some of the most important ones are:
statsmodels: regression, linear models, time series analysis, extensions to topics also covered by
scipy.stats.Pandas: tabular data, time series functionality, interfaces to other statistical languages.
PyMC: Bayesian statistical modeling, probabilistic machine learning.
scikit-learn: classification, regression, model selection.
Seaborn: statistical data visualization.
rpy2: Python to R bridge.
Probability distributions#
Each univariate distribution is an instance of a subclass of rv_continuous
(rv_discrete for discrete distributions):
  | 
A generic continuous random variable class meant for subclassing.  | 
  | 
A generic discrete random variable class meant for subclassing.  | 
  | 
Generates a distribution given by a histogram.  | 
Continuous distributions#
An alpha continuous random variable.  | 
|
An anglit continuous random variable.  | 
|
An arcsine continuous random variable.  | 
|
Argus distribution  | 
|
A beta continuous random variable.  | 
|
A beta prime continuous random variable.  | 
|
A Bradford continuous random variable.  | 
|
A Burr (Type III) continuous random variable.  | 
|
A Burr (Type XII) continuous random variable.  | 
|
A Cauchy continuous random variable.  | 
|
A chi continuous random variable.  | 
|
A chi-squared continuous random variable.  | 
|
A cosine continuous random variable.  | 
|
Crystalball distribution  | 
|
A double gamma continuous random variable.  | 
|
A double Weibull continuous random variable.  | 
|
An Erlang continuous random variable.  | 
|
An exponential continuous random variable.  | 
|
An exponentially modified Normal continuous random variable.  | 
|
An exponentiated Weibull continuous random variable.  | 
|
An exponential power continuous random variable.  | 
|
An F continuous random variable.  | 
|
A fatigue-life (Birnbaum-Saunders) continuous random variable.  | 
|
A Fisk continuous random variable.  | 
|
A folded Cauchy continuous random variable.  | 
|
A folded normal continuous random variable.  | 
|
A generalized logistic continuous random variable.  | 
|
A generalized normal continuous random variable.  | 
|
A generalized Pareto continuous random variable.  | 
|
A generalized exponential continuous random variable.  | 
|
A generalized extreme value continuous random variable.  | 
|
A Gauss hypergeometric continuous random variable.  | 
|
A gamma continuous random variable.  | 
|
A generalized gamma continuous random variable.  | 
|
A generalized half-logistic continuous random variable.  | 
|
A generalized hyperbolic continuous random variable.  | 
|
A Generalized Inverse Gaussian continuous random variable.  | 
|
A Gilbrat continuous random variable.  | 
|
A Gompertz (or truncated Gumbel) continuous random variable.  | 
|
A right-skewed Gumbel continuous random variable.  | 
|
A left-skewed Gumbel continuous random variable.  | 
|
A Half-Cauchy continuous random variable.  | 
|
A half-logistic continuous random variable.  | 
|
A half-normal continuous random variable.  | 
|
The upper half of a generalized normal continuous random variable.  | 
|
A hyperbolic secant continuous random variable.  | 
|
An inverted gamma continuous random variable.  | 
|
An inverse Gaussian continuous random variable.  | 
|
An inverted Weibull continuous random variable.  | 
|
A Johnson SB continuous random variable.  | 
|
A Johnson SU continuous random variable.  | 
|
Kappa 4 parameter distribution.  | 
|
Kappa 3 parameter distribution.  | 
|
Kolmogorov-Smirnov one-sided test statistic distribution.  | 
|
Kolmogorov-Smirnov two-sided test statistic distribution.  | 
|
Limiting distribution of scaled Kolmogorov-Smirnov two-sided test statistic.  | 
|
A Laplace continuous random variable.  | 
|
An asymmetric Laplace continuous random variable.  | 
|
A Levy continuous random variable.  | 
|
A left-skewed Levy continuous random variable.  | 
|
A Levy-stable continuous random variable.  | 
|
A logistic (or Sech-squared) continuous random variable.  | 
|
A log gamma continuous random variable.  | 
|
A log-Laplace continuous random variable.  | 
|
A lognormal continuous random variable.  | 
|
A loguniform or reciprocal continuous random variable.  | 
|
A Lomax (Pareto of the second kind) continuous random variable.  | 
|
A Maxwell continuous random variable.  | 
|
A Mielke Beta-Kappa / Dagum continuous random variable.  | 
|
A Moyal continuous random variable.  | 
|
A Nakagami continuous random variable.  | 
|
A non-central chi-squared continuous random variable.  | 
|
A non-central F distribution continuous random variable.  | 
|
A non-central Student's t continuous random variable.  | 
|
A normal continuous random variable.  | 
|
A Normal Inverse Gaussian continuous random variable.  | 
|
A Pareto continuous random variable.  | 
|
A pearson type III continuous random variable.  | 
|
A power-function continuous random variable.  | 
|
A power log-normal continuous random variable.  | 
|
A power normal continuous random variable.  | 
|
An R-distributed (symmetric beta) continuous random variable.  | 
|
A Rayleigh continuous random variable.  | 
|
A Rice continuous random variable.  | 
|
A reciprocal inverse Gaussian continuous random variable.  | 
|
A semicircular continuous random variable.  | 
|
A skewed Cauchy random variable.  | 
|
A skew-normal random variable.  | 
|
A studentized range continuous random variable.  | 
|
A Student's t continuous random variable.  | 
|
A trapezoidal continuous random variable.  | 
|
A triangular continuous random variable.  | 
|
A truncated exponential continuous random variable.  | 
|
A truncated normal continuous random variable.  | 
|
A Tukey-Lamdba continuous random variable.  | 
|
A uniform continuous random variable.  | 
|
A Von Mises continuous random variable.  | 
|
A Von Mises continuous random variable.  | 
|
A Wald continuous random variable.  | 
|
Weibull minimum continuous random variable.  | 
|
Weibull maximum continuous random variable.  | 
|
A wrapped Cauchy continuous random variable.  | 
Multivariate distributions#
A multivariate normal random variable.  | 
|
A matrix normal random variable.  | 
|
A Dirichlet random variable.  | 
|
A Wishart random variable.  | 
|
An inverse Wishart random variable.  | 
|
A multinomial random variable.  | 
|
A matrix-valued SO(N) random variable.  | 
|
A matrix-valued O(N) random variable.  | 
|
A matrix-valued U(N) random variable.  | 
|
A random correlation matrix.  | 
|
A multivariate t-distributed random variable.  | 
|
A multivariate hypergeometric random variable.  | 
Discrete distributions#
A Bernoulli discrete random variable.  | 
|
A beta-binomial discrete random variable.  | 
|
A binomial discrete random variable.  | 
|
A Boltzmann (Truncated Discrete Exponential) random variable.  | 
|
A Laplacian discrete random variable.  | 
|
A geometric discrete random variable.  | 
|
A hypergeometric discrete random variable.  | 
|
A Logarithmic (Log-Series, Series) discrete random variable.  | 
|
A negative binomial discrete random variable.  | 
|
A Fisher's noncentral hypergeometric discrete random variable.  | 
|
A Wallenius' noncentral hypergeometric discrete random variable.  | 
|
A negative hypergeometric discrete random variable.  | 
|
A Planck discrete exponential random variable.  | 
|
A Poisson discrete random variable.  | 
|
A uniform discrete random variable.  | 
|
A Skellam discrete random variable.  | 
|
A Yule-Simon discrete random variable.  | 
|
A Zipf (Zeta) discrete random variable.  | 
|
A Zipfian discrete random variable.  | 
An overview of statistical functions is given below.  Many of these functions
have a similar version in scipy.stats.mstats which work for masked arrays.
Summary statistics#
  | 
Compute several descriptive statistics of the passed array.  | 
  | 
Compute the geometric mean along the specified axis.  | 
  | 
Calculate the harmonic mean along the specified axis.  | 
  | 
Compute the kurtosis (Fisher or Pearson) of a dataset.  | 
  | 
Return an array of the modal (most common) value in the passed array.  | 
  | 
Calculate the nth moment about the mean for a sample.  | 
  | 
Compute the sample skewness of a data set.  | 
  | 
Return the nth k-statistic (1<=n<=4 so far).  | 
  | 
Return an unbiased estimator of the variance of the k-statistic.  | 
  | 
Compute the trimmed mean.  | 
  | 
Compute the trimmed variance.  | 
  | 
Compute the trimmed minimum.  | 
  | 
Compute the trimmed maximum.  | 
  | 
Compute the trimmed sample standard deviation.  | 
  | 
Compute the trimmed standard error of the mean.  | 
  | 
Compute the coefficient of variation.  | 
  | 
Find repeats and repeat counts.  | 
  | 
Return mean of array after trimming distribution from both tails.  | 
  | 
Calculate the geometric standard deviation of an array.  | 
  | 
Compute the interquartile range of the data along the specified axis.  | 
  | 
Compute standard error of the mean.  | 
  | 
Bayesian confidence intervals for the mean, var, and std.  | 
  | 
'Frozen' distributions for mean, variance, and standard deviation of data.  | 
  | 
Calculate the entropy of a distribution for given probability values.  | 
  | 
Given a sample of a distribution, estimate the differential entropy.  | 
  | 
  | 
  | 
Compute the median absolute deviation of the data along the given axis.  | 
  | 
Compute a two-sided bootstrap confidence interval of a statistic.  | 
Frequency statistics#
  | 
Return a cumulative frequency histogram, using the histogram function.  | 
  | 
  | 
  | 
Compute the percentile rank of a score relative to a list of scores.  | 
  | 
Calculate the score at a given percentile of the input sequence.  | 
  | 
Return a relative frequency histogram, using the histogram function.  | 
  | 
Compute a binned statistic for one or more sets of data.  | 
  | 
Compute a bidimensional binned statistic for one or more sets of data.  | 
  | 
Compute a multidimensional binned statistic for a set of data.  | 
Correlation functions#
  | 
Perform one-way ANOVA.  | 
  | 
Performs the Alexander Govern test.  | 
  | 
Pearson correlation coefficient and p-value for testing non-correlation.  | 
  | 
Calculate a Spearman correlation coefficient with associated p-value.  | 
  | 
Calculate a point biserial correlation coefficient and its p-value.  | 
  | 
Calculate Kendall's tau, a correlation measure for ordinal data.  | 
  | 
Compute a weighted version of Kendall's \(\tau\).  | 
  | 
Calculates Somers' D, an asymmetric measure of ordinal association.  | 
  | 
Calculate a linear least-squares regression for two sets of measurements.  | 
  | 
Computes the Siegel estimator for a set of points (x, y).  | 
  | 
Computes the Theil-Sen estimator for a set of points (x, y).  | 
  | 
Computes the Multiscale Graph Correlation (MGC) test statistic.  | 
Statistical tests#
  | 
Calculate the T-test for the mean of ONE group of scores.  | 
  | 
Calculate the T-test for the means of two independent samples of scores.  | 
  | 
T-test for means of two independent samples from descriptive statistics.  | 
  | 
Calculate the t-test on TWO RELATED samples of scores, a and b.  | 
  | 
Calculate a one-way chi-square test.  | 
  | 
Perform the one-sample Cramér-von Mises test for goodness of fit.  | 
  | 
Perform the two-sample Cramér-von Mises test for goodness of fit.  | 
  | 
Cressie-Read power divergence statistic and goodness of fit test.  | 
  | 
Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit.  | 
  | 
Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.  | 
  | 
Performs the two-sample Kolmogorov-Smirnov test for goodness of fit.  | 
  | 
Compute the Epps-Singleton (ES) test statistic.  | 
  | 
Perform the Mann-Whitney U rank test on two independent samples.  | 
  | 
Tie correction factor for Mann-Whitney U and Kruskal-Wallis H tests.  | 
  | 
Assign ranks to data, dealing with ties appropriately.  | 
  | 
Compute the Wilcoxon rank-sum statistic for two samples.  | 
  | 
Calculate the Wilcoxon signed-rank test.  | 
  | 
Compute the Kruskal-Wallis H-test for independent samples.  | 
  | 
Compute the Friedman test for repeated measurements.  | 
  | 
Compute the Brunner-Munzel test on samples x and y.  | 
  | 
Combine p-values from independent tests bearing upon the same hypothesis.  | 
  | 
Perform the Jarque-Bera goodness of fit test on sample data.  | 
  | 
Perform Page's Test, a measure of trend in observations between treatments.  | 
  | 
Performs a permutation test of a given statistic on provided data.  | 
  | 
Perform Tukey's HSD test for equality of means over multiple treatments.  | 
  | 
Perform the Ansari-Bradley test for equal scale parameters.  | 
  | 
Perform Bartlett's test for equal variances.  | 
  | 
Perform Levene test for equal variances.  | 
  | 
Perform the Shapiro-Wilk test for normality.  | 
  | 
Anderson-Darling test for data coming from a particular distribution.  | 
  | 
The Anderson-Darling test for k-samples.  | 
  | 
Perform a test that the probability of success is p.  | 
  | 
Perform a test that the probability of success is p.  | 
  | 
Perform Fligner-Killeen test for equality of variance.  | 
  | 
Perform a Mood's median test.  | 
  | 
Perform Mood's test for equal scale parameters.  | 
  | 
Test whether the skew is different from the normal distribution.  | 
  | 
Test whether a dataset has normal kurtosis.  | 
  | 
Test whether a sample differs from a normal distribution.  | 
Quasi-Monte Carlo#
Masked statistics functions#
- Statistical functions for masked arrays (
scipy.stats.mstats)- Summary statistics
- scipy.stats.mstats.describe
 - scipy.stats.mstats.gmean
 - scipy.stats.mstats.hmean
 - scipy.stats.mstats.kurtosis
 - scipy.stats.mstats.mode
 - scipy.stats.mstats.mquantiles
 - scipy.stats.mstats.hdmedian
 - scipy.stats.mstats.hdquantiles
 - scipy.stats.mstats.hdquantiles_sd
 - scipy.stats.mstats.idealfourths
 - scipy.stats.mstats.plotting_positions
 - scipy.stats.mstats.meppf
 - scipy.stats.mstats.moment
 - scipy.stats.mstats.skew
 - scipy.stats.mstats.tmean
 - scipy.stats.mstats.tvar
 - scipy.stats.mstats.tmin
 - scipy.stats.mstats.tmax
 - scipy.stats.mstats.tsem
 - scipy.stats.mstats.variation
 - scipy.stats.mstats.find_repeats
 - scipy.stats.mstats.sem
 - scipy.stats.mstats.trimmed_mean
 - scipy.stats.mstats.trimmed_mean_ci
 - scipy.stats.mstats.trimmed_std
 - scipy.stats.mstats.trimmed_var
 
 - Frequency statistics
 - Correlation functions
- scipy.stats.mstats.f_oneway
 - scipy.stats.mstats.pearsonr
 - scipy.stats.mstats.spearmanr
 - scipy.stats.mstats.pointbiserialr
 - scipy.stats.mstats.kendalltau
 - scipy.stats.mstats.kendalltau_seasonal
 - scipy.stats.mstats.linregress
 - scipy.stats.mstats.siegelslopes
 - scipy.stats.mstats.theilslopes
 - scipy.stats.mstats.sen_seasonal_slopes
 
 - Statistical tests
- scipy.stats.mstats.ttest_1samp
 - scipy.stats.mstats.ttest_onesamp
 - scipy.stats.mstats.ttest_ind
 - scipy.stats.mstats.ttest_rel
 - scipy.stats.mstats.chisquare
 - scipy.stats.mstats.kstest
 - scipy.stats.mstats.ks_2samp
 - scipy.stats.mstats.ks_1samp
 - scipy.stats.mstats.ks_twosamp
 - scipy.stats.mstats.mannwhitneyu
 - scipy.stats.mstats.rankdata
 - scipy.stats.mstats.kruskal
 - scipy.stats.mstats.kruskalwallis
 - scipy.stats.mstats.friedmanchisquare
 - scipy.stats.mstats.brunnermunzel
 - scipy.stats.mstats.skewtest
 - scipy.stats.mstats.kurtosistest
 - scipy.stats.mstats.normaltest
 
 - Transformations
 - Other
 
 - Summary statistics
 
Other statistical functionality#
Transformations#
  | 
Return a dataset transformed by a Box-Cox power transformation.  | 
  | 
Compute optimal Box-Cox transform parameter for input data.  | 
  | 
The boxcox log-likelihood function.  | 
  | 
Return a dataset transformed by a Yeo-Johnson power transformation.  | 
  | 
Compute optimal Yeo-Johnson transform parameter.  | 
  | 
The yeojohnson log-likelihood function.  | 
  | 
Compute the O'Brien transform on input data (any number of arrays).  | 
  | 
Perform iterative sigma-clipping of array elements.  | 
  | 
Slice off a proportion of items from both ends of an array.  | 
  | 
Slice off a proportion from ONE end of the passed array distribution.  | 
  | 
Calculate the relative z-scores.  | 
  | 
Compute the z score.  | 
  | 
Compute the geometric standard score.  | 
Statistical distances#
  | 
Compute the first Wasserstein distance between two 1D distributions.  | 
  | 
Compute the energy distance between two 1D distributions.  | 
Sampling#
Random variate generation / CDF Inversion#
  | 
Generate random samples from a probability density function using the ratio-of-uniforms method.  | 
  | 
Methods  | 
Circular statistical functions#
  | 
Compute the circular mean for samples in a range.  | 
  | 
Compute the circular variance for samples assumed to be in a range.  | 
  | 
Compute the circular standard deviation for samples assumed to be in the range [low to high].  | 
Contingency table functions#
  | 
Chi-square test of independence of variables in a contingency table.  | 
  | 
Return table of counts for each possible unique combination in   | 
  | 
Compute the expected frequencies from a contingency table.  | 
Return a list of the marginal sums of the array a.  | 
|
  | 
Compute the relative risk (also known as the risk ratio).  | 
  | 
Calculates degree of association between two nominal variables.  | 
  | 
Perform a Fisher exact test on a 2x2 contingency table.  | 
  | 
Perform a Barnard exact test on a 2x2 contingency table.  | 
  | 
Perform Boschloo's exact test on a 2x2 contingency table.  | 
Plot-tests#
  | 
Calculate the shape parameter that maximizes the PPCC.  | 
  | 
Calculate and optionally plot probability plot correlation coefficient.  | 
  | 
Calculate quantiles for a probability plot, and optionally show the plot.  | 
  | 
Compute parameters for a Box-Cox normality plot, optionally show it.  | 
  | 
Compute parameters for a Yeo-Johnson normality plot, optionally show it.  | 
Univariate and multivariate kernel density estimation#
  | 
Representation of a kernel-density estimate using Gaussian kernels.  | 
Warnings / Errors used in scipy.stats#
  | 
Warning generated by   | 
Warning generated by   | 
|
  | 
Warning generated by   | 
Warning generated by   | 
|
Warning generated by   | 
|
Warning generated by   |