SciPy

Statistical functions (scipy.stats)

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):

rv_continuous([momtype, a, b, xtol, …])

A generic continuous random variable class meant for subclassing.

rv_discrete([a, b, name, badvalue, …])

A generic discrete random variable class meant for subclassing.

rv_histogram(histogram, *args, **kwargs)

Generates a distribution given by a histogram.

Continuous distributions

alpha(\*args, \*\*kwds)

An alpha continuous random variable.

anglit(\*args, \*\*kwds)

An anglit continuous random variable.

arcsine(\*args, \*\*kwds)

An arcsine continuous random variable.

argus(\*args, \*\*kwds)

Argus distribution

beta(\*args, \*\*kwds)

A beta continuous random variable.

betaprime(\*args, \*\*kwds)

A beta prime continuous random variable.

bradford(\*args, \*\*kwds)

A Bradford continuous random variable.

burr(\*args, \*\*kwds)

A Burr (Type III) continuous random variable.

burr12(\*args, \*\*kwds)

A Burr (Type XII) continuous random variable.

cauchy(\*args, \*\*kwds)

A Cauchy continuous random variable.

chi(\*args, \*\*kwds)

A chi continuous random variable.

chi2(\*args, \*\*kwds)

A chi-squared continuous random variable.

cosine(\*args, \*\*kwds)

A cosine continuous random variable.

crystalball(\*args, \*\*kwds)

Crystalball distribution

dgamma(\*args, \*\*kwds)

A double gamma continuous random variable.

dweibull(\*args, \*\*kwds)

A double Weibull continuous random variable.

erlang(\*args, \*\*kwds)

An Erlang continuous random variable.

expon(\*args, \*\*kwds)

An exponential continuous random variable.

exponnorm(\*args, \*\*kwds)

An exponentially modified Normal continuous random variable.

exponweib(\*args, \*\*kwds)

An exponentiated Weibull continuous random variable.

exponpow(\*args, \*\*kwds)

An exponential power continuous random variable.

f(\*args, \*\*kwds)

An F continuous random variable.

fatiguelife(\*args, \*\*kwds)

A fatigue-life (Birnbaum-Saunders) continuous random variable.

fisk(\*args, \*\*kwds)

A Fisk continuous random variable.

foldcauchy(\*args, \*\*kwds)

A folded Cauchy continuous random variable.

foldnorm(\*args, \*\*kwds)

A folded normal continuous random variable.

frechet_r(\*args, \*\*kwds)

A frechet_r continuous random variable.

frechet_l(\*args, \*\*kwds)

A frechet_l continuous random variable.

genlogistic(\*args, \*\*kwds)

A generalized logistic continuous random variable.

gennorm(\*args, \*\*kwds)

A generalized normal continuous random variable.

genpareto(\*args, \*\*kwds)

A generalized Pareto continuous random variable.

genexpon(\*args, \*\*kwds)

A generalized exponential continuous random variable.

genextreme(\*args, \*\*kwds)

A generalized extreme value continuous random variable.

gausshyper(\*args, \*\*kwds)

A Gauss hypergeometric continuous random variable.

gamma(\*args, \*\*kwds)

A gamma continuous random variable.

gengamma(\*args, \*\*kwds)

A generalized gamma continuous random variable.

genhalflogistic(\*args, \*\*kwds)

A generalized half-logistic continuous random variable.

gilbrat(\*args, \*\*kwds)

A Gilbrat continuous random variable.

gompertz(\*args, \*\*kwds)

A Gompertz (or truncated Gumbel) continuous random variable.

gumbel_r(\*args, \*\*kwds)

A right-skewed Gumbel continuous random variable.

gumbel_l(\*args, \*\*kwds)

A left-skewed Gumbel continuous random variable.

halfcauchy(\*args, \*\*kwds)

A Half-Cauchy continuous random variable.

halflogistic(\*args, \*\*kwds)

A half-logistic continuous random variable.

halfnorm(\*args, \*\*kwds)

A half-normal continuous random variable.

halfgennorm(\*args, \*\*kwds)

The upper half of a generalized normal continuous random variable.

hypsecant(\*args, \*\*kwds)

A hyperbolic secant continuous random variable.

invgamma(\*args, \*\*kwds)

An inverted gamma continuous random variable.

invgauss(\*args, \*\*kwds)

An inverse Gaussian continuous random variable.

invweibull(\*args, \*\*kwds)

An inverted Weibull continuous random variable.

johnsonsb(\*args, \*\*kwds)

A Johnson SB continuous random variable.

johnsonsu(\*args, \*\*kwds)

A Johnson SU continuous random variable.

kappa4(\*args, \*\*kwds)

Kappa 4 parameter distribution.

kappa3(\*args, \*\*kwds)

Kappa 3 parameter distribution.

ksone(\*args, \*\*kwds)

General Kolmogorov-Smirnov one-sided test.

kstwobign(\*args, \*\*kwds)

Kolmogorov-Smirnov two-sided test for large N.

laplace(\*args, \*\*kwds)

A Laplace continuous random variable.

levy(\*args, \*\*kwds)

A Levy continuous random variable.

levy_l(\*args, \*\*kwds)

A left-skewed Levy continuous random variable.

levy_stable(\*args, \*\*kwds)

A Levy-stable continuous random variable.

logistic(\*args, \*\*kwds)

A logistic (or Sech-squared) continuous random variable.

loggamma(\*args, \*\*kwds)

A log gamma continuous random variable.

loglaplace(\*args, \*\*kwds)

A log-Laplace continuous random variable.

lognorm(\*args, \*\*kwds)

A lognormal continuous random variable.

lomax(\*args, \*\*kwds)

A Lomax (Pareto of the second kind) continuous random variable.

maxwell(\*args, \*\*kwds)

A Maxwell continuous random variable.

mielke(\*args, \*\*kwds)

A Mielke’s Beta-Kappa continuous random variable.

moyal(\*args, \*\*kwds)

A Moyal continuous random variable.

nakagami(\*args, \*\*kwds)

A Nakagami continuous random variable.

ncx2(\*args, \*\*kwds)

A non-central chi-squared continuous random variable.

ncf(\*args, \*\*kwds)

A non-central F distribution continuous random variable.

nct(\*args, \*\*kwds)

A non-central Student’s t continuous random variable.

norm(\*args, \*\*kwds)

A normal continuous random variable.

norminvgauss(\*args, \*\*kwds)

A Normal Inverse Gaussian continuous random variable.

pareto(\*args, \*\*kwds)

A Pareto continuous random variable.

pearson3(\*args, \*\*kwds)

A pearson type III continuous random variable.

powerlaw(\*args, \*\*kwds)

A power-function continuous random variable.

powerlognorm(\*args, \*\*kwds)

A power log-normal continuous random variable.

powernorm(\*args, \*\*kwds)

A power normal continuous random variable.

rdist(\*args, \*\*kwds)

An R-distributed continuous random variable.

reciprocal(\*args, \*\*kwds)

A reciprocal continuous random variable.

rayleigh(\*args, \*\*kwds)

A Rayleigh continuous random variable.

rice(\*args, \*\*kwds)

A Rice continuous random variable.

recipinvgauss(\*args, \*\*kwds)

A reciprocal inverse Gaussian continuous random variable.

semicircular(\*args, \*\*kwds)

A semicircular continuous random variable.

skewnorm(\*args, \*\*kwds)

A skew-normal random variable.

t(\*args, \*\*kwds)

A Student’s t continuous random variable.

trapz(\*args, \*\*kwds)

A trapezoidal continuous random variable.

triang(\*args, \*\*kwds)

A triangular continuous random variable.

truncexpon(\*args, \*\*kwds)

A truncated exponential continuous random variable.

truncnorm(\*args, \*\*kwds)

A truncated normal continuous random variable.

tukeylambda(\*args, \*\*kwds)

A Tukey-Lamdba continuous random variable.

uniform(\*args, \*\*kwds)

A uniform continuous random variable.

vonmises(\*args, \*\*kwds)

A Von Mises continuous random variable.

vonmises_line(\*args, \*\*kwds)

A Von Mises continuous random variable.

wald(\*args, \*\*kwds)

A Wald continuous random variable.

weibull_min(\*args, \*\*kwds)

Weibull minimum continuous random variable.

weibull_max(\*args, \*\*kwds)

Weibull maximum continuous random variable.

wrapcauchy(\*args, \*\*kwds)

A wrapped Cauchy continuous random variable.

Multivariate distributions

multivariate_normal([mean, cov, …])

A multivariate normal random variable.

matrix_normal([mean, rowcov, colcov, seed])

A matrix normal random variable.

dirichlet(alpha[, seed])

A Dirichlet random variable.

wishart([df, scale, seed])

A Wishart random variable.

invwishart([df, scale, seed])

An inverse Wishart random variable.

multinomial(n, p[, seed])

A multinomial random variable.

special_ortho_group([dim, seed])

A matrix-valued SO(N) random variable.

ortho_group

A matrix-valued O(N) random variable.

unitary_group

A matrix-valued U(N) random variable.

random_correlation

A random correlation matrix.

Discrete distributions

bernoulli(\*args, \*\*kwds)

A Bernoulli discrete random variable.

binom(\*args, \*\*kwds)

A binomial discrete random variable.

boltzmann(\*args, \*\*kwds)

A Boltzmann (Truncated Discrete Exponential) random variable.

dlaplace(\*args, \*\*kwds)

A Laplacian discrete random variable.

geom(\*args, \*\*kwds)

A geometric discrete random variable.

hypergeom(\*args, \*\*kwds)

A hypergeometric discrete random variable.

logser(\*args, \*\*kwds)

A Logarithmic (Log-Series, Series) discrete random variable.

nbinom(\*args, \*\*kwds)

A negative binomial discrete random variable.

planck(\*args, \*\*kwds)

A Planck discrete exponential random variable.

poisson(\*args, \*\*kwds)

A Poisson discrete random variable.

randint(\*args, \*\*kwds)

A uniform discrete random variable.

skellam(\*args, \*\*kwds)

A Skellam discrete random variable.

zipf(\*args, \*\*kwds)

A Zipf discrete random variable.

yulesimon(\*args, \*\*kwds)

A Yule-Simon discrete random variable.

An overview of statistical functions is given below. Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

Summary statistics

describe(a[, axis, ddof, bias, nan_policy])

Compute several descriptive statistics of the passed array.

gmean(a[, axis, dtype])

Compute the geometric mean along the specified axis.

hmean(a[, axis, dtype])

Calculate the harmonic mean along the specified axis.

kurtosis(a[, axis, fisher, bias, nan_policy])

Compute the kurtosis (Fisher or Pearson) of a dataset.

mode(a[, axis, nan_policy])

Return an array of the modal (most common) value in the passed array.

moment(a[, moment, axis, nan_policy])

Calculate the nth moment about the mean for a sample.

skew(a[, axis, bias, nan_policy])

Compute the skewness of a data set.

kstat(data[, n])

Return the nth k-statistic (1<=n<=4 so far).

kstatvar(data[, n])

Returns an unbiased estimator of the variance of the k-statistic.

tmean(a[, limits, inclusive, axis])

Compute the trimmed mean.

tvar(a[, limits, inclusive, axis, ddof])

Compute the trimmed variance.

tmin(a[, lowerlimit, axis, inclusive, …])

Compute the trimmed minimum.

tmax(a[, upperlimit, axis, inclusive, …])

Compute the trimmed maximum.

tstd(a[, limits, inclusive, axis, ddof])

Compute the trimmed sample standard deviation.

tsem(a[, limits, inclusive, axis, ddof])

Compute the trimmed standard error of the mean.

variation(a[, axis, nan_policy])

Compute the coefficient of variation, the ratio of the biased standard deviation to the mean.

find_repeats(arr)

Find repeats and repeat counts.

trim_mean(a, proportiontocut[, axis])

Return mean of array after trimming distribution from both tails.

iqr(x[, axis, rng, scale, nan_policy, …])

Compute the interquartile range of the data along the specified axis.

sem(a[, axis, ddof, nan_policy])

Calculate the standard error of the mean (or standard error of measurement) of the values in the input array.

bayes_mvs(data[, alpha])

Bayesian confidence intervals for the mean, var, and std.

mvsdist(data)

‘Frozen’ distributions for mean, variance, and standard deviation of data.

entropy(pk[, qk, base])

Calculate the entropy of a distribution for given probability values.

Frequency statistics

cumfreq(a[, numbins, defaultreallimits, weights])

Return a cumulative frequency histogram, using the histogram function.

itemfreq(\*args, \*\*kwds)

itemfreq is deprecated! itemfreq is deprecated and will be removed in a future version.

percentileofscore(a, score[, kind])

The percentile rank of a score relative to a list of scores.

scoreatpercentile(a, per[, limit, …])

Calculate the score at a given percentile of the input sequence.

relfreq(a[, numbins, defaultreallimits, weights])

Return a relative frequency histogram, using the histogram function.

binned_statistic(x, values[, statistic, …])

Compute a binned statistic for one or more sets of data.

binned_statistic_2d(x, y, values[, …])

Compute a bidimensional binned statistic for one or more sets of data.

binned_statistic_dd(sample, values[, …])

Compute a multidimensional binned statistic for a set of data.

Correlation functions

f_oneway(\*args)

Performs a 1-way ANOVA.

pearsonr(x, y)

Calculate a Pearson correlation coefficient and the p-value for testing non-correlation.

spearmanr(a[, b, axis, nan_policy])

Calculate a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

pointbiserialr(x, y)

Calculate a point biserial correlation coefficient and its p-value.

kendalltau(x, y[, initial_lexsort, …])

Calculate Kendall’s tau, a correlation measure for ordinal data.

weightedtau(x, y[, rank, weigher, additive])

Compute a weighted version of Kendall’s \(\tau\).

linregress(x[, y])

Calculate a linear least-squares regression for two sets of measurements.

siegelslopes(y[, x, method])

Computes the Siegel estimator for a set of points (x, y).

theilslopes(y[, x, alpha])

Computes the Theil-Sen estimator for a set of points (x, y).

Statistical tests

ttest_1samp(a, popmean[, axis, nan_policy])

Calculate the T-test for the mean of ONE group of scores.

ttest_ind(a, b[, axis, equal_var, nan_policy])

Calculate the T-test for the means of two independent samples of scores.

ttest_ind_from_stats(mean1, std1, nobs1, …)

T-test for means of two independent samples from descriptive statistics.

ttest_rel(a, b[, axis, nan_policy])

Calculate the T-test on TWO RELATED samples of scores, a and b.

kstest(rvs, cdf[, args, N, alternative, mode])

Perform the Kolmogorov-Smirnov test for goodness of fit.

chisquare(f_obs[, f_exp, ddof, axis])

Calculate a one-way chi square test.

power_divergence(f_obs[, f_exp, ddof, axis, …])

Cressie-Read power divergence statistic and goodness of fit test.

ks_2samp(data1, data2)

Compute the Kolmogorov-Smirnov statistic on 2 samples.

mannwhitneyu(x, y[, use_continuity, alternative])

Compute the Mann-Whitney rank test on samples x and y.

tiecorrect(rankvals)

Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests.

rankdata(a[, method])

Assign ranks to data, dealing with ties appropriately.

ranksums(x, y)

Compute the Wilcoxon rank-sum statistic for two samples.

wilcoxon(x[, y, zero_method, correction])

Calculate the Wilcoxon signed-rank test.

kruskal(\*args, \*\*kwargs)

Compute the Kruskal-Wallis H-test for independent samples

friedmanchisquare(\*args)

Compute the Friedman test for repeated measurements

brunnermunzel(x, y[, alternative, …])

Computes the Brunner-Munzel test on samples x and y

combine_pvalues(pvalues[, method, weights])

Methods for combining the p-values of independent tests bearing upon the same hypothesis.

jarque_bera(x)

Perform the Jarque-Bera goodness of fit test on sample data.

ansari(x, y)

Perform the Ansari-Bradley test for equal scale parameters

bartlett(\*args)

Perform Bartlett’s test for equal variances

levene(\*args, \*\*kwds)

Perform Levene test for equal variances.

shapiro(x)

Perform the Shapiro-Wilk test for normality.

anderson(x[, dist])

Anderson-Darling test for data coming from a particular distribution

anderson_ksamp(samples[, midrank])

The Anderson-Darling test for k-samples.

binom_test(x[, n, p, alternative])

Perform a test that the probability of success is p.

fligner(\*args, \*\*kwds)

Perform Fligner-Killeen test for equality of variance.

median_test(\*args, \*\*kwds)

Mood’s median test.

mood(x, y[, axis])

Perform Mood’s test for equal scale parameters.

skewtest(a[, axis, nan_policy])

Test whether the skew is different from the normal distribution.

kurtosistest(a[, axis, nan_policy])

Test whether a dataset has normal kurtosis.

normaltest(a[, axis, nan_policy])

Test whether a sample differs from a normal distribution.

Transformations

boxcox(x[, lmbda, alpha])

Return a positive dataset transformed by a Box-Cox power transformation.

boxcox_normmax(x[, brack, method])

Compute optimal Box-Cox transform parameter for input data.

boxcox_llf(lmb, data)

The boxcox log-likelihood function.

yeojohnson(x[, lmbda])

Return a dataset transformed by a Yeo-Johnson power transformation.

yeojohnson_normmax(x[, brack])

Compute optimal Yeo-Johnson transform parameter for input data, using maximum likelihood estimation.

yeojohnson_llf(lmb, data)

The yeojohnson log-likelihood function.

obrientransform(\*args)

Compute the O’Brien transform on input data (any number of arrays).

sigmaclip(a[, low, high])

Iterative sigma-clipping of array elements.

trimboth(a, proportiontocut[, axis])

Slices off a proportion of items from both ends of an array.

trim1(a, proportiontocut[, tail, axis])

Slices off a proportion from ONE end of the passed array distribution.

zmap(scores, compare[, axis, ddof])

Calculate the relative z-scores.

zscore(a[, axis, ddof])

Calculate the z score of each value in the sample, relative to the sample mean and standard deviation.

Statistical distances

wasserstein_distance(u_values, v_values[, …])

Compute the first Wasserstein distance between two 1D distributions.

energy_distance(u_values, v_values[, …])

Compute the energy distance between two 1D distributions.

Random variate generation

rvs_ratio_uniforms(pdf, umax, vmin, vmax[, …])

Generate random samples from a probability density function using the ratio-of-uniforms method.

Circular statistical functions

circmean(samples[, high, low, axis])

Compute the circular mean for samples in a range.

circvar(samples[, high, low, axis])

Compute the circular variance for samples assumed to be in a range

circstd(samples[, high, low, axis])

Compute the circular standard deviation for samples assumed to be in the range [low to high].

Contingency table functions

chi2_contingency(observed[, correction, lambda_])

Chi-square test of independence of variables in a contingency table.

contingency.expected_freq(observed)

Compute the expected frequencies from a contingency table.

contingency.margins(a)

Return a list of the marginal sums of the array a.

fisher_exact(table[, alternative])

Performs a Fisher exact test on a 2x2 contingency table.

Plot-tests

ppcc_max(x[, brack, dist])

Calculate the shape parameter that maximizes the PPCC

ppcc_plot(x, a, b[, dist, plot, N])

Calculate and optionally plot probability plot correlation coefficient.

probplot(x[, sparams, dist, fit, plot, rvalue])

Calculate quantiles for a probability plot, and optionally show the plot.

boxcox_normplot(x, la, lb[, plot, N])

Compute parameters for a Box-Cox normality plot, optionally show it.

yeojohnson_normplot(x, la, lb[, plot, N])

Compute parameters for a Yeo-Johnson normality plot, optionally show it.

Masked statistics functions

Univariate and multivariate kernel density estimation (scipy.stats.kde)

gaussian_kde(dataset[, bw_method, weights])

Representation of a kernel-density estimate using Gaussian kernels.

For many more stat related functions install the software R and the interface package rpy.