SciPy

Statistical functions (scipy.stats)

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):

rv_continuous([momtype, a, b, xtol, …])

A generic continuous random variable class meant for subclassing.

rv_discrete([a, b, name, badvalue, …])

A generic discrete random variable class meant for subclassing.

rv_histogram(histogram, *args, **kwargs)

Generates a distribution given by a histogram.

Continuous distributions

alpha(*args, **kwds)

An alpha continuous random variable.

anglit(*args, **kwds)

An anglit continuous random variable.

arcsine(*args, **kwds)

An arcsine continuous random variable.

argus(*args, **kwds)

Argus distribution

beta(*args, **kwds)

A beta continuous random variable.

betaprime(*args, **kwds)

A beta prime continuous random variable.

bradford(*args, **kwds)

A Bradford continuous random variable.

burr(*args, **kwds)

A Burr (Type III) continuous random variable.

burr12(*args, **kwds)

A Burr (Type XII) continuous random variable.

cauchy(*args, **kwds)

A Cauchy continuous random variable.

chi(*args, **kwds)

A chi continuous random variable.

chi2(*args, **kwds)

A chi-squared continuous random variable.

cosine(*args, **kwds)

A cosine continuous random variable.

crystalball(*args, **kwds)

Crystalball distribution

dgamma(*args, **kwds)

A double gamma continuous random variable.

dweibull(*args, **kwds)

A double Weibull continuous random variable.

erlang(*args, **kwds)

An Erlang continuous random variable.

expon(*args, **kwds)

An exponential continuous random variable.

exponnorm(*args, **kwds)

An exponentially modified Normal continuous random variable.

exponweib(*args, **kwds)

An exponentiated Weibull continuous random variable.

exponpow(*args, **kwds)

An exponential power continuous random variable.

f(*args, **kwds)

An F continuous random variable.

fatiguelife(*args, **kwds)

A fatigue-life (Birnbaum-Saunders) continuous random variable.

fisk(*args, **kwds)

A Fisk continuous random variable.

foldcauchy(*args, **kwds)

A folded Cauchy continuous random variable.

foldnorm(*args, **kwds)

A folded normal continuous random variable.

genlogistic(*args, **kwds)

A generalized logistic continuous random variable.

gennorm(*args, **kwds)

A generalized normal continuous random variable.

genpareto(*args, **kwds)

A generalized Pareto continuous random variable.

genexpon(*args, **kwds)

A generalized exponential continuous random variable.

genextreme(*args, **kwds)

A generalized extreme value continuous random variable.

gausshyper(*args, **kwds)

A Gauss hypergeometric continuous random variable.

gamma(*args, **kwds)

A gamma continuous random variable.

gengamma(*args, **kwds)

A generalized gamma continuous random variable.

genhalflogistic(*args, **kwds)

A generalized half-logistic continuous random variable.

geninvgauss(*args, **kwds)

A Generalized Inverse Gaussian continuous random variable.

gilbrat(*args, **kwds)

A Gilbrat continuous random variable.

gompertz(*args, **kwds)

A Gompertz (or truncated Gumbel) continuous random variable.

gumbel_r(*args, **kwds)

A right-skewed Gumbel continuous random variable.

gumbel_l(*args, **kwds)

A left-skewed Gumbel continuous random variable.

halfcauchy(*args, **kwds)

A Half-Cauchy continuous random variable.

halflogistic(*args, **kwds)

A half-logistic continuous random variable.

halfnorm(*args, **kwds)

A half-normal continuous random variable.

halfgennorm(*args, **kwds)

The upper half of a generalized normal continuous random variable.

hypsecant(*args, **kwds)

A hyperbolic secant continuous random variable.

invgamma(*args, **kwds)

An inverted gamma continuous random variable.

invgauss(*args, **kwds)

An inverse Gaussian continuous random variable.

invweibull(*args, **kwds)

An inverted Weibull continuous random variable.

johnsonsb(*args, **kwds)

A Johnson SB continuous random variable.

johnsonsu(*args, **kwds)

A Johnson SU continuous random variable.

kappa4(*args, **kwds)

Kappa 4 parameter distribution.

kappa3(*args, **kwds)

Kappa 3 parameter distribution.

ksone(*args, **kwds)

Kolmogorov-Smirnov one-sided test statistic distribution.

kstwo(*args, **kwds)

Kolmogorov-Smirnov two-sided test statistic distribution.

kstwobign(*args, **kwds)

Limiting distribution of scaled Kolmogorov-Smirnov two-sided test statistic.

laplace(*args, **kwds)

A Laplace continuous random variable.

laplace_asymmetric(*args, **kwds)

An asymmetric Laplace continuous random variable.

levy(*args, **kwds)

A Levy continuous random variable.

levy_l(*args, **kwds)

A left-skewed Levy continuous random variable.

levy_stable(*args, **kwds)

A Levy-stable continuous random variable.

logistic(*args, **kwds)

A logistic (or Sech-squared) continuous random variable.

loggamma(*args, **kwds)

A log gamma continuous random variable.

loglaplace(*args, **kwds)

A log-Laplace continuous random variable.

lognorm(*args, **kwds)

A lognormal continuous random variable.

loguniform(*args, **kwds)

A loguniform or reciprocal continuous random variable.

lomax(*args, **kwds)

A Lomax (Pareto of the second kind) continuous random variable.

maxwell(*args, **kwds)

A Maxwell continuous random variable.

mielke(*args, **kwds)

A Mielke Beta-Kappa / Dagum continuous random variable.

moyal(*args, **kwds)

A Moyal continuous random variable.

nakagami(*args, **kwds)

A Nakagami continuous random variable.

ncx2(*args, **kwds)

A non-central chi-squared continuous random variable.

ncf(*args, **kwds)

A non-central F distribution continuous random variable.

nct(*args, **kwds)

A non-central Student’s t continuous random variable.

norm(*args, **kwds)

A normal continuous random variable.

norminvgauss(*args, **kwds)

A Normal Inverse Gaussian continuous random variable.

pareto(*args, **kwds)

A Pareto continuous random variable.

pearson3(*args, **kwds)

A pearson type III continuous random variable.

powerlaw(*args, **kwds)

A power-function continuous random variable.

powerlognorm(*args, **kwds)

A power log-normal continuous random variable.

powernorm(*args, **kwds)

A power normal continuous random variable.

rdist(*args, **kwds)

An R-distributed (symmetric beta) continuous random variable.

rayleigh(*args, **kwds)

A Rayleigh continuous random variable.

rice(*args, **kwds)

A Rice continuous random variable.

recipinvgauss(*args, **kwds)

A reciprocal inverse Gaussian continuous random variable.

semicircular(*args, **kwds)

A semicircular continuous random variable.

skewnorm(*args, **kwds)

A skew-normal random variable.

t(*args, **kwds)

A Student’s t continuous random variable.

trapezoid(*args, **kwds)

A trapezoidal continuous random variable.

triang(*args, **kwds)

A triangular continuous random variable.

truncexpon(*args, **kwds)

A truncated exponential continuous random variable.

truncnorm(*args, **kwds)

A truncated normal continuous random variable.

tukeylambda(*args, **kwds)

A Tukey-Lamdba continuous random variable.

uniform(*args, **kwds)

A uniform continuous random variable.

vonmises(*args, **kwds)

A Von Mises continuous random variable.

vonmises_line(*args, **kwds)

A Von Mises continuous random variable.

wald(*args, **kwds)

A Wald continuous random variable.

weibull_min(*args, **kwds)

Weibull minimum continuous random variable.

weibull_max(*args, **kwds)

Weibull maximum continuous random variable.

wrapcauchy(*args, **kwds)

A wrapped Cauchy continuous random variable.

Multivariate distributions

multivariate_normal([mean, cov, …])

A multivariate normal random variable.

matrix_normal([mean, rowcov, colcov, seed])

A matrix normal random variable.

dirichlet(alpha[, seed])

A Dirichlet random variable.

wishart([df, scale, seed])

A Wishart random variable.

invwishart([df, scale, seed])

An inverse Wishart random variable.

multinomial(n, p[, seed])

A multinomial random variable.

special_ortho_group([dim, seed])

A matrix-valued SO(N) random variable.

ortho_group

A matrix-valued O(N) random variable.

unitary_group

A matrix-valued U(N) random variable.

random_correlation

A random correlation matrix.

multivariate_t([loc, shape, df, …])

A multivariate t-distributed random variable.

multivariate_hypergeom(m, n[, seed])

A multivariate hypergeometric random variable.

Discrete distributions

bernoulli(*args, **kwds)

A Bernoulli discrete random variable.

betabinom(*args, **kwds)

A beta-binomial discrete random variable.

binom(*args, **kwds)

A binomial discrete random variable.

boltzmann(*args, **kwds)

A Boltzmann (Truncated Discrete Exponential) random variable.

dlaplace(*args, **kwds)

A Laplacian discrete random variable.

geom(*args, **kwds)

A geometric discrete random variable.

hypergeom(*args, **kwds)

A hypergeometric discrete random variable.

logser(*args, **kwds)

A Logarithmic (Log-Series, Series) discrete random variable.

nbinom(*args, **kwds)

A negative binomial discrete random variable.

nhypergeom(*args, **kwds)

A negative hypergeometric discrete random variable.

planck(*args, **kwds)

A Planck discrete exponential random variable.

poisson(*args, **kwds)

A Poisson discrete random variable.

randint(*args, **kwds)

A uniform discrete random variable.

skellam(*args, **kwds)

A Skellam discrete random variable.

zipf(*args, **kwds)

A Zipf discrete random variable.

yulesimon(*args, **kwds)

A Yule-Simon discrete random variable.

An overview of statistical functions is given below. Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

Summary statistics

describe(a[, axis, ddof, bias, nan_policy])

Compute several descriptive statistics of the passed array.

gmean(a[, axis, dtype])

Compute the geometric mean along the specified axis.

hmean(a[, axis, dtype])

Calculate the harmonic mean along the specified axis.

kurtosis(a[, axis, fisher, bias, nan_policy])

Compute the kurtosis (Fisher or Pearson) of a dataset.

mode(a[, axis, nan_policy])

Return an array of the modal (most common) value in the passed array.

moment(a[, moment, axis, nan_policy])

Calculate the nth moment about the mean for a sample.

skew(a[, axis, bias, nan_policy])

Compute the sample skewness of a data set.

kstat(data[, n])

Return the nth k-statistic (1<=n<=4 so far).

kstatvar(data[, n])

Return an unbiased estimator of the variance of the k-statistic.

tmean(a[, limits, inclusive, axis])

Compute the trimmed mean.

tvar(a[, limits, inclusive, axis, ddof])

Compute the trimmed variance.

tmin(a[, lowerlimit, axis, inclusive, …])

Compute the trimmed minimum.

tmax(a[, upperlimit, axis, inclusive, …])

Compute the trimmed maximum.

tstd(a[, limits, inclusive, axis, ddof])

Compute the trimmed sample standard deviation.

tsem(a[, limits, inclusive, axis, ddof])

Compute the trimmed standard error of the mean.

variation(a[, axis, nan_policy])

Compute the coefficient of variation.

find_repeats(arr)

Find repeats and repeat counts.

trim_mean(a, proportiontocut[, axis])

Return mean of array after trimming distribution from both tails.

gstd(a[, axis, ddof])

Calculate the geometric standard deviation of an array.

iqr(x[, axis, rng, scale, nan_policy, …])

Compute the interquartile range of the data along the specified axis.

sem(a[, axis, ddof, nan_policy])

Compute standard error of the mean.

bayes_mvs(data[, alpha])

Bayesian confidence intervals for the mean, var, and std.

mvsdist(data)

‘Frozen’ distributions for mean, variance, and standard deviation of data.

entropy(pk[, qk, base, axis])

Calculate the entropy of a distribution for given probability values.

median_absolute_deviation(*args, **kwds)

median_absolute_deviation is deprecated, use median_abs_deviation instead!

median_abs_deviation(x[, axis, center, …])

Compute the median absolute deviation of the data along the given axis.

Frequency statistics

cumfreq(a[, numbins, defaultreallimits, weights])

Return a cumulative frequency histogram, using the histogram function.

itemfreq(*args, **kwds)

itemfreq is deprecated! itemfreq is deprecated and will be removed in a future version.

percentileofscore(a, score[, kind])

Compute the percentile rank of a score relative to a list of scores.

scoreatpercentile(a, per[, limit, …])

Calculate the score at a given percentile of the input sequence.

relfreq(a[, numbins, defaultreallimits, weights])

Return a relative frequency histogram, using the histogram function.

binned_statistic(x, values[, statistic, …])

Compute a binned statistic for one or more sets of data.

binned_statistic_2d(x, y, values[, …])

Compute a bidimensional binned statistic for one or more sets of data.

binned_statistic_dd(sample, values[, …])

Compute a multidimensional binned statistic for a set of data.

Correlation functions

f_oneway(*args[, axis])

Perform one-way ANOVA.

pearsonr(x, y)

Pearson correlation coefficient and p-value for testing non-correlation.

spearmanr(a[, b, axis, nan_policy])

Calculate a Spearman correlation coefficient with associated p-value.

pointbiserialr(x, y)

Calculate a point biserial correlation coefficient and its p-value.

kendalltau(x, y[, initial_lexsort, …])

Calculate Kendall’s tau, a correlation measure for ordinal data.

weightedtau(x, y[, rank, weigher, additive])

Compute a weighted version of Kendall’s \(\tau\).

linregress(x[, y])

Calculate a linear least-squares regression for two sets of measurements.

siegelslopes(y[, x, method])

Computes the Siegel estimator for a set of points (x, y).

theilslopes(y[, x, alpha])

Computes the Theil-Sen estimator for a set of points (x, y).

multiscale_graphcorr(x, y[, …])

Computes the Multiscale Graph Correlation (MGC) test statistic.

Statistical tests

ttest_1samp(a, popmean[, axis, nan_policy, …])

Calculate the T-test for the mean of ONE group of scores.

ttest_ind(a, b[, axis, equal_var, …])

Calculate the T-test for the means of two independent samples of scores.

ttest_ind_from_stats(mean1, std1, nobs1, …)

T-test for means of two independent samples from descriptive statistics.

ttest_rel(a, b[, axis, nan_policy, alternative])

Calculate the t-test on TWO RELATED samples of scores, a and b.

chisquare(f_obs[, f_exp, ddof, axis])

Calculate a one-way chi-square test.

cramervonmises(rvs, cdf[, args])

Perform the Cramér-von Mises test for goodness of fit.

power_divergence(f_obs[, f_exp, ddof, axis, …])

Cressie-Read power divergence statistic and goodness of fit test.

kstest(rvs, cdf[, args, N, alternative, mode])

Performs the (one sample or two samples) Kolmogorov-Smirnov test for goodness of fit.

ks_1samp(x, cdf[, args, alternative, mode])

Performs the Kolmogorov-Smirnov test for goodness of fit.

ks_2samp(data1, data2[, alternative, mode])

Compute the Kolmogorov-Smirnov statistic on 2 samples.

epps_singleton_2samp(x, y[, t])

Compute the Epps-Singleton (ES) test statistic.

mannwhitneyu(x, y[, use_continuity, alternative])

Compute the Mann-Whitney rank test on samples x and y.

tiecorrect(rankvals)

Tie correction factor for Mann-Whitney U and Kruskal-Wallis H tests.

rankdata(a[, method, axis])

Assign ranks to data, dealing with ties appropriately.

ranksums(x, y)

Compute the Wilcoxon rank-sum statistic for two samples.

wilcoxon(x[, y, zero_method, correction, …])

Calculate the Wilcoxon signed-rank test.

kruskal(*args[, nan_policy])

Compute the Kruskal-Wallis H-test for independent samples.

friedmanchisquare(*args)

Compute the Friedman test for repeated measurements.

brunnermunzel(x, y[, alternative, …])

Compute the Brunner-Munzel test on samples x and y.

combine_pvalues(pvalues[, method, weights])

Combine p-values from independent tests bearing upon the same hypothesis.

jarque_bera(x)

Perform the Jarque-Bera goodness of fit test on sample data.

ansari(x, y)

Perform the Ansari-Bradley test for equal scale parameters.

bartlett(*args)

Perform Bartlett’s test for equal variances.

levene(*args[, center, proportiontocut])

Perform Levene test for equal variances.

shapiro(x)

Perform the Shapiro-Wilk test for normality.

anderson(x[, dist])

Anderson-Darling test for data coming from a particular distribution.

anderson_ksamp(samples[, midrank])

The Anderson-Darling test for k-samples.

binom_test(x[, n, p, alternative])

Perform a test that the probability of success is p.

fligner(*args[, center, proportiontocut])

Perform Fligner-Killeen test for equality of variance.

median_test(*args[, ties, correction, …])

Perform a Mood’s median test.

mood(x, y[, axis])

Perform Mood’s test for equal scale parameters.

skewtest(a[, axis, nan_policy])

Test whether the skew is different from the normal distribution.

kurtosistest(a[, axis, nan_policy])

Test whether a dataset has normal kurtosis.

normaltest(a[, axis, nan_policy])

Test whether a sample differs from a normal distribution.

Transformations

boxcox(x[, lmbda, alpha])

Return a dataset transformed by a Box-Cox power transformation.

boxcox_normmax(x[, brack, method])

Compute optimal Box-Cox transform parameter for input data.

boxcox_llf(lmb, data)

The boxcox log-likelihood function.

yeojohnson(x[, lmbda])

Return a dataset transformed by a Yeo-Johnson power transformation.

yeojohnson_normmax(x[, brack])

Compute optimal Yeo-Johnson transform parameter.

yeojohnson_llf(lmb, data)

The yeojohnson log-likelihood function.

obrientransform(*args)

Compute the O’Brien transform on input data (any number of arrays).

sigmaclip(a[, low, high])

Perform iterative sigma-clipping of array elements.

trimboth(a, proportiontocut[, axis])

Slice off a proportion of items from both ends of an array.

trim1(a, proportiontocut[, tail, axis])

Slice off a proportion from ONE end of the passed array distribution.

zmap(scores, compare[, axis, ddof])

Calculate the relative z-scores.

zscore(a[, axis, ddof, nan_policy])

Compute the z score.

Statistical distances

wasserstein_distance(u_values, v_values[, …])

Compute the first Wasserstein distance between two 1D distributions.

energy_distance(u_values, v_values[, …])

Compute the energy distance between two 1D distributions.

Random variate generation

rvs_ratio_uniforms(pdf, umax, vmin, vmax[, …])

Generate random samples from a probability density function using the ratio-of-uniforms method.

Circular statistical functions

circmean(samples[, high, low, axis, nan_policy])

Compute the circular mean for samples in a range.

circvar(samples[, high, low, axis, nan_policy])

Compute the circular variance for samples assumed to be in a range.

circstd(samples[, high, low, axis, nan_policy])

Compute the circular standard deviation for samples assumed to be in the range [low to high].

Contingency table functions

chi2_contingency(observed[, correction, lambda_])

Chi-square test of independence of variables in a contingency table.

contingency.expected_freq(observed)

Compute the expected frequencies from a contingency table.

contingency.margins(a)

Return a list of the marginal sums of the array a.

fisher_exact(table[, alternative])

Perform a Fisher exact test on a 2x2 contingency table.

Plot-tests

ppcc_max(x[, brack, dist])

Calculate the shape parameter that maximizes the PPCC.

ppcc_plot(x, a, b[, dist, plot, N])

Calculate and optionally plot probability plot correlation coefficient.

probplot(x[, sparams, dist, fit, plot, rvalue])

Calculate quantiles for a probability plot, and optionally show the plot.

boxcox_normplot(x, la, lb[, plot, N])

Compute parameters for a Box-Cox normality plot, optionally show it.

yeojohnson_normplot(x, la, lb[, plot, N])

Compute parameters for a Yeo-Johnson normality plot, optionally show it.

Masked statistics functions

Univariate and multivariate kernel density estimation

gaussian_kde(dataset[, bw_method, weights])

Representation of a kernel-density estimate using Gaussian kernels.

Warnings used in scipy.stats

F_onewayConstantInputWarning([msg])

Warning generated by f_oneway when an input is constant, e.g.

F_onewayBadInputSizesWarning

Warning generated by f_oneway when an input has length 0, or if all the inputs have length 1.

PearsonRConstantInputWarning([msg])

Warning generated by pearsonr when an input is constant.

PearsonRNearConstantInputWarning([msg])

Warning generated by pearsonr when an input is nearly constant.

SpearmanRConstantInputWarning([msg])

Warning generated by spearmanr when an input is constant.

For many more stat related functions install the software R and the interface package rpy.