# Statistical functions (scipy.stats)¶

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):

 rv_continuous([momtype, a, b, xtol, …]) A generic continuous random variable class meant for subclassing. rv_discrete([a, b, name, badvalue, …]) A generic discrete random variable class meant for subclassing. rv_histogram(histogram, *args, **kwargs) Generates a distribution given by a histogram.

## Continuous distributions¶

 alpha(\*args, \*\*kwds) An alpha continuous random variable. anglit(\*args, \*\*kwds) An anglit continuous random variable. arcsine(\*args, \*\*kwds) An arcsine continuous random variable. argus(\*args, \*\*kwds) Argus distribution beta(\*args, \*\*kwds) A beta continuous random variable. betaprime(\*args, \*\*kwds) A beta prime continuous random variable. bradford(\*args, \*\*kwds) A Bradford continuous random variable. burr(\*args, \*\*kwds) A Burr (Type III) continuous random variable. burr12(\*args, \*\*kwds) A Burr (Type XII) continuous random variable. cauchy(\*args, \*\*kwds) A Cauchy continuous random variable. chi(\*args, \*\*kwds) A chi continuous random variable. chi2(\*args, \*\*kwds) A chi-squared continuous random variable. cosine(\*args, \*\*kwds) A cosine continuous random variable. crystalball(\*args, \*\*kwds) Crystalball distribution dgamma(\*args, \*\*kwds) A double gamma continuous random variable. dweibull(\*args, \*\*kwds) A double Weibull continuous random variable. erlang(\*args, \*\*kwds) An Erlang continuous random variable. expon(\*args, \*\*kwds) An exponential continuous random variable. exponnorm(\*args, \*\*kwds) An exponentially modified Normal continuous random variable. exponweib(\*args, \*\*kwds) An exponentiated Weibull continuous random variable. exponpow(\*args, \*\*kwds) An exponential power continuous random variable. f(\*args, \*\*kwds) An F continuous random variable. fatiguelife(\*args, \*\*kwds) A fatigue-life (Birnbaum-Saunders) continuous random variable. fisk(\*args, \*\*kwds) A Fisk continuous random variable. foldcauchy(\*args, \*\*kwds) A folded Cauchy continuous random variable. foldnorm(\*args, \*\*kwds) A folded normal continuous random variable. frechet_r(\*args, \*\*kwds) A frechet_r continuous random variable. frechet_l(\*args, \*\*kwds) A frechet_l continuous random variable. genlogistic(\*args, \*\*kwds) A generalized logistic continuous random variable. gennorm(\*args, \*\*kwds) A generalized normal continuous random variable. genpareto(\*args, \*\*kwds) A generalized Pareto continuous random variable. genexpon(\*args, \*\*kwds) A generalized exponential continuous random variable. genextreme(\*args, \*\*kwds) A generalized extreme value continuous random variable. gausshyper(\*args, \*\*kwds) A Gauss hypergeometric continuous random variable. gamma(\*args, \*\*kwds) A gamma continuous random variable. gengamma(\*args, \*\*kwds) A generalized gamma continuous random variable. genhalflogistic(\*args, \*\*kwds) A generalized half-logistic continuous random variable. gilbrat(\*args, \*\*kwds) A Gilbrat continuous random variable. gompertz(\*args, \*\*kwds) A Gompertz (or truncated Gumbel) continuous random variable. gumbel_r(\*args, \*\*kwds) A right-skewed Gumbel continuous random variable. gumbel_l(\*args, \*\*kwds) A left-skewed Gumbel continuous random variable. halfcauchy(\*args, \*\*kwds) A Half-Cauchy continuous random variable. halflogistic(\*args, \*\*kwds) A half-logistic continuous random variable. halfnorm(\*args, \*\*kwds) A half-normal continuous random variable. halfgennorm(\*args, \*\*kwds) The upper half of a generalized normal continuous random variable. hypsecant(\*args, \*\*kwds) A hyperbolic secant continuous random variable. invgamma(\*args, \*\*kwds) An inverted gamma continuous random variable. invgauss(\*args, \*\*kwds) An inverse Gaussian continuous random variable. invweibull(\*args, \*\*kwds) An inverted Weibull continuous random variable. johnsonsb(\*args, \*\*kwds) A Johnson SB continuous random variable. johnsonsu(\*args, \*\*kwds) A Johnson SU continuous random variable. kappa4(\*args, \*\*kwds) Kappa 4 parameter distribution. kappa3(\*args, \*\*kwds) Kappa 3 parameter distribution. ksone(\*args, \*\*kwds) General Kolmogorov-Smirnov one-sided test. kstwobign(\*args, \*\*kwds) Kolmogorov-Smirnov two-sided test for large N. laplace(\*args, \*\*kwds) A Laplace continuous random variable. levy(\*args, \*\*kwds) A Levy continuous random variable. levy_l(\*args, \*\*kwds) A left-skewed Levy continuous random variable. levy_stable(\*args, \*\*kwds) A Levy-stable continuous random variable. logistic(\*args, \*\*kwds) A logistic (or Sech-squared) continuous random variable. loggamma(\*args, \*\*kwds) A log gamma continuous random variable. loglaplace(\*args, \*\*kwds) A log-Laplace continuous random variable. lognorm(\*args, \*\*kwds) A lognormal continuous random variable. lomax(\*args, \*\*kwds) A Lomax (Pareto of the second kind) continuous random variable. maxwell(\*args, \*\*kwds) A Maxwell continuous random variable. mielke(\*args, \*\*kwds) A Mielke’s Beta-Kappa continuous random variable. moyal(\*args, \*\*kwds) A Moyal continuous random variable. nakagami(\*args, \*\*kwds) A Nakagami continuous random variable. ncx2(\*args, \*\*kwds) A non-central chi-squared continuous random variable. ncf(\*args, \*\*kwds) A non-central F distribution continuous random variable. nct(\*args, \*\*kwds) A non-central Student’s t continuous random variable. norm(\*args, \*\*kwds) A normal continuous random variable. norminvgauss(\*args, \*\*kwds) A Normal Inverse Gaussian continuous random variable. pareto(\*args, \*\*kwds) A Pareto continuous random variable. pearson3(\*args, \*\*kwds) A pearson type III continuous random variable. powerlaw(\*args, \*\*kwds) A power-function continuous random variable. powerlognorm(\*args, \*\*kwds) A power log-normal continuous random variable. powernorm(\*args, \*\*kwds) A power normal continuous random variable. rdist(\*args, \*\*kwds) An R-distributed continuous random variable. reciprocal(\*args, \*\*kwds) A reciprocal continuous random variable. rayleigh(\*args, \*\*kwds) A Rayleigh continuous random variable. rice(\*args, \*\*kwds) A Rice continuous random variable. recipinvgauss(\*args, \*\*kwds) A reciprocal inverse Gaussian continuous random variable. semicircular(\*args, \*\*kwds) A semicircular continuous random variable. skewnorm(\*args, \*\*kwds) A skew-normal random variable. t(\*args, \*\*kwds) A Student’s t continuous random variable. trapz(\*args, \*\*kwds) A trapezoidal continuous random variable. triang(\*args, \*\*kwds) A triangular continuous random variable. truncexpon(\*args, \*\*kwds) A truncated exponential continuous random variable. truncnorm(\*args, \*\*kwds) A truncated normal continuous random variable. tukeylambda(\*args, \*\*kwds) A Tukey-Lamdba continuous random variable. uniform(\*args, \*\*kwds) A uniform continuous random variable. vonmises(\*args, \*\*kwds) A Von Mises continuous random variable. vonmises_line(\*args, \*\*kwds) A Von Mises continuous random variable. wald(\*args, \*\*kwds) A Wald continuous random variable. weibull_min(\*args, \*\*kwds) Weibull minimum continuous random variable. weibull_max(\*args, \*\*kwds) Weibull maximum continuous random variable. wrapcauchy(\*args, \*\*kwds) A wrapped Cauchy continuous random variable.

## Multivariate distributions¶

 multivariate_normal([mean, cov, …]) A multivariate normal random variable. matrix_normal([mean, rowcov, colcov, seed]) A matrix normal random variable. dirichlet(alpha[, seed]) A Dirichlet random variable. wishart([df, scale, seed]) A Wishart random variable. invwishart([df, scale, seed]) An inverse Wishart random variable. multinomial(n, p[, seed]) A multinomial random variable. special_ortho_group([dim, seed]) A matrix-valued SO(N) random variable. ortho_group A matrix-valued O(N) random variable. unitary_group A matrix-valued U(N) random variable. random_correlation A random correlation matrix.

## Discrete distributions¶

 bernoulli(\*args, \*\*kwds) A Bernoulli discrete random variable. binom(\*args, \*\*kwds) A binomial discrete random variable. boltzmann(\*args, \*\*kwds) A Boltzmann (Truncated Discrete Exponential) random variable. dlaplace(\*args, \*\*kwds) A Laplacian discrete random variable. geom(\*args, \*\*kwds) A geometric discrete random variable. hypergeom(\*args, \*\*kwds) A hypergeometric discrete random variable. logser(\*args, \*\*kwds) A Logarithmic (Log-Series, Series) discrete random variable. nbinom(\*args, \*\*kwds) A negative binomial discrete random variable. planck(\*args, \*\*kwds) A Planck discrete exponential random variable. poisson(\*args, \*\*kwds) A Poisson discrete random variable. randint(\*args, \*\*kwds) A uniform discrete random variable. skellam(\*args, \*\*kwds) A Skellam discrete random variable. zipf(\*args, \*\*kwds) A Zipf discrete random variable. yulesimon(\*args, \*\*kwds) A Yule-Simon discrete random variable.

An overview of statistical functions is given below. Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

## Summary statistics¶

 describe(a[, axis, ddof, bias, nan_policy]) Compute several descriptive statistics of the passed array. gmean(a[, axis, dtype]) Compute the geometric mean along the specified axis. hmean(a[, axis, dtype]) Calculate the harmonic mean along the specified axis. kurtosis(a[, axis, fisher, bias, nan_policy]) Compute the kurtosis (Fisher or Pearson) of a dataset. mode(a[, axis, nan_policy]) Return an array of the modal (most common) value in the passed array. moment(a[, moment, axis, nan_policy]) Calculate the nth moment about the mean for a sample. skew(a[, axis, bias, nan_policy]) Compute the skewness of a data set. kstat(data[, n]) Return the nth k-statistic (1<=n<=4 so far). kstatvar(data[, n]) Returns an unbiased estimator of the variance of the k-statistic. tmean(a[, limits, inclusive, axis]) Compute the trimmed mean. tvar(a[, limits, inclusive, axis, ddof]) Compute the trimmed variance. tmin(a[, lowerlimit, axis, inclusive, …]) Compute the trimmed minimum. tmax(a[, upperlimit, axis, inclusive, …]) Compute the trimmed maximum. tstd(a[, limits, inclusive, axis, ddof]) Compute the trimmed sample standard deviation. tsem(a[, limits, inclusive, axis, ddof]) Compute the trimmed standard error of the mean. variation(a[, axis, nan_policy]) Compute the coefficient of variation, the ratio of the biased standard deviation to the mean. Find repeats and repeat counts. trim_mean(a, proportiontocut[, axis]) Return mean of array after trimming distribution from both tails. iqr(x[, axis, rng, scale, nan_policy, …]) Compute the interquartile range of the data along the specified axis. sem(a[, axis, ddof, nan_policy]) Calculate the standard error of the mean (or standard error of measurement) of the values in the input array. bayes_mvs(data[, alpha]) Bayesian confidence intervals for the mean, var, and std. mvsdist(data) ‘Frozen’ distributions for mean, variance, and standard deviation of data. entropy(pk[, qk, base]) Calculate the entropy of a distribution for given probability values.

## Frequency statistics¶

 cumfreq(a[, numbins, defaultreallimits, weights]) Return a cumulative frequency histogram, using the histogram function. itemfreq(\*args, \*\*kwds) itemfreq is deprecated! itemfreq is deprecated and will be removed in a future version. percentileofscore(a, score[, kind]) The percentile rank of a score relative to a list of scores. scoreatpercentile(a, per[, limit, …]) Calculate the score at a given percentile of the input sequence. relfreq(a[, numbins, defaultreallimits, weights]) Return a relative frequency histogram, using the histogram function.
 binned_statistic(x, values[, statistic, …]) Compute a binned statistic for one or more sets of data. binned_statistic_2d(x, y, values[, …]) Compute a bidimensional binned statistic for one or more sets of data. binned_statistic_dd(sample, values[, …]) Compute a multidimensional binned statistic for a set of data.

## Correlation functions¶

 f_oneway(\*args) Performs a 1-way ANOVA. pearsonr(x, y) Calculate a Pearson correlation coefficient and the p-value for testing non-correlation. spearmanr(a[, b, axis, nan_policy]) Calculate a Spearman rank-order correlation coefficient and the p-value to test for non-correlation. pointbiserialr(x, y) Calculate a point biserial correlation coefficient and its p-value. kendalltau(x, y[, initial_lexsort, …]) Calculate Kendall’s tau, a correlation measure for ordinal data. weightedtau(x, y[, rank, weigher, additive]) Compute a weighted version of Kendall’s $$\tau$$. linregress(x[, y]) Calculate a linear least-squares regression for two sets of measurements. siegelslopes(y[, x, method]) Computes the Siegel estimator for a set of points (x, y). theilslopes(y[, x, alpha]) Computes the Theil-Sen estimator for a set of points (x, y).

## Statistical tests¶

 ttest_1samp(a, popmean[, axis, nan_policy]) Calculate the T-test for the mean of ONE group of scores. ttest_ind(a, b[, axis, equal_var, nan_policy]) Calculate the T-test for the means of two independent samples of scores. ttest_ind_from_stats(mean1, std1, nobs1, …) T-test for means of two independent samples from descriptive statistics. ttest_rel(a, b[, axis, nan_policy]) Calculate the T-test on TWO RELATED samples of scores, a and b. kstest(rvs, cdf[, args, N, alternative, mode]) Perform the Kolmogorov-Smirnov test for goodness of fit. chisquare(f_obs[, f_exp, ddof, axis]) Calculate a one-way chi square test. power_divergence(f_obs[, f_exp, ddof, axis, …]) Cressie-Read power divergence statistic and goodness of fit test. ks_2samp(data1, data2) Compute the Kolmogorov-Smirnov statistic on 2 samples. mannwhitneyu(x, y[, use_continuity, alternative]) Compute the Mann-Whitney rank test on samples x and y. tiecorrect(rankvals) Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests. rankdata(a[, method]) Assign ranks to data, dealing with ties appropriately. ranksums(x, y) Compute the Wilcoxon rank-sum statistic for two samples. wilcoxon(x[, y, zero_method, correction]) Calculate the Wilcoxon signed-rank test. kruskal(\*args, \*\*kwargs) Compute the Kruskal-Wallis H-test for independent samples friedmanchisquare(\*args) Compute the Friedman test for repeated measurements brunnermunzel(x, y[, alternative, …]) Computes the Brunner-Munzel test on samples x and y combine_pvalues(pvalues[, method, weights]) Methods for combining the p-values of independent tests bearing upon the same hypothesis. Perform the Jarque-Bera goodness of fit test on sample data.
 ansari(x, y) Perform the Ansari-Bradley test for equal scale parameters bartlett(\*args) Perform Bartlett’s test for equal variances levene(\*args, \*\*kwds) Perform Levene test for equal variances. Perform the Shapiro-Wilk test for normality. anderson(x[, dist]) Anderson-Darling test for data coming from a particular distribution anderson_ksamp(samples[, midrank]) The Anderson-Darling test for k-samples. binom_test(x[, n, p, alternative]) Perform a test that the probability of success is p. fligner(\*args, \*\*kwds) Perform Fligner-Killeen test for equality of variance. median_test(\*args, \*\*kwds) Mood’s median test. mood(x, y[, axis]) Perform Mood’s test for equal scale parameters. skewtest(a[, axis, nan_policy]) Test whether the skew is different from the normal distribution. kurtosistest(a[, axis, nan_policy]) Test whether a dataset has normal kurtosis. normaltest(a[, axis, nan_policy]) Test whether a sample differs from a normal distribution.

## Transformations¶

 boxcox(x[, lmbda, alpha]) Return a positive dataset transformed by a Box-Cox power transformation. boxcox_normmax(x[, brack, method]) Compute optimal Box-Cox transform parameter for input data. boxcox_llf(lmb, data) The boxcox log-likelihood function. yeojohnson(x[, lmbda]) Return a dataset transformed by a Yeo-Johnson power transformation. yeojohnson_normmax(x[, brack]) Compute optimal Yeo-Johnson transform parameter for input data, using maximum likelihood estimation. yeojohnson_llf(lmb, data) The yeojohnson log-likelihood function. obrientransform(\*args) Compute the O’Brien transform on input data (any number of arrays). sigmaclip(a[, low, high]) Iterative sigma-clipping of array elements. trimboth(a, proportiontocut[, axis]) Slices off a proportion of items from both ends of an array. trim1(a, proportiontocut[, tail, axis]) Slices off a proportion from ONE end of the passed array distribution. zmap(scores, compare[, axis, ddof]) Calculate the relative z-scores. zscore(a[, axis, ddof]) Calculate the z score of each value in the sample, relative to the sample mean and standard deviation.

## Statistical distances¶

 wasserstein_distance(u_values, v_values[, …]) Compute the first Wasserstein distance between two 1D distributions. energy_distance(u_values, v_values[, …]) Compute the energy distance between two 1D distributions.

## Random variate generation¶

 rvs_ratio_uniforms(pdf, umax, vmin, vmax[, …]) Generate random samples from a probability density function using the ratio-of-uniforms method.

## Circular statistical functions¶

 circmean(samples[, high, low, axis]) Compute the circular mean for samples in a range. circvar(samples[, high, low, axis]) Compute the circular variance for samples assumed to be in a range circstd(samples[, high, low, axis]) Compute the circular standard deviation for samples assumed to be in the range [low to high].

## Contingency table functions¶

 chi2_contingency(observed[, correction, lambda_]) Chi-square test of independence of variables in a contingency table. contingency.expected_freq(observed) Compute the expected frequencies from a contingency table. Return a list of the marginal sums of the array a. fisher_exact(table[, alternative]) Performs a Fisher exact test on a 2x2 contingency table.

## Plot-tests¶

 ppcc_max(x[, brack, dist]) Calculate the shape parameter that maximizes the PPCC ppcc_plot(x, a, b[, dist, plot, N]) Calculate and optionally plot probability plot correlation coefficient. probplot(x[, sparams, dist, fit, plot, rvalue]) Calculate quantiles for a probability plot, and optionally show the plot. boxcox_normplot(x, la, lb[, plot, N]) Compute parameters for a Box-Cox normality plot, optionally show it. yeojohnson_normplot(x, la, lb[, plot, N]) Compute parameters for a Yeo-Johnson normality plot, optionally show it.

## Univariate and multivariate kernel density estimation (scipy.stats.kde)¶

 gaussian_kde(dataset[, bw_method, weights]) Representation of a kernel-density estimate using Gaussian kernels.

For many more stat related functions install the software R and the interface package rpy.