SciPy

Statistical functions (scipy.stats)

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):

rv_continuous([momtype, a, b, xtol, ...]) A generic continuous random variable class meant for subclassing.
rv_discrete([a, b, name, badvalue, ...]) A generic discrete random variable class meant for subclassing.

Continuous distributions

alpha An alpha continuous random variable.
anglit An anglit continuous random variable.
arcsine An arcsine continuous random variable.
beta A beta continuous random variable.
betaprime A beta prime continuous random variable.
bradford A Bradford continuous random variable.
burr A Burr continuous random variable.
cauchy A Cauchy continuous random variable.
chi A chi continuous random variable.
chi2 A chi-squared continuous random variable.
cosine A cosine continuous random variable.
dgamma A double gamma continuous random variable.
dweibull A double Weibull continuous random variable.
erlang An Erlang continuous random variable.
expon An exponential continuous random variable.
exponnorm An exponentially modified Normal continuous random variable.
exponweib An exponentiated Weibull continuous random variable.
exponpow An exponential power continuous random variable.
f An F continuous random variable.
fatiguelife A fatigue-life (Birnbaum-Saunders) continuous random variable.
fisk A Fisk continuous random variable.
foldcauchy A folded Cauchy continuous random variable.
foldnorm A folded normal continuous random variable.
frechet_r A Frechet right (or Weibull minimum) continuous random variable.
frechet_l A Frechet left (or Weibull maximum) continuous random variable.
genlogistic A generalized logistic continuous random variable.
gennorm A generalized normal continuous random variable.
genpareto A generalized Pareto continuous random variable.
genexpon A generalized exponential continuous random variable.
genextreme A generalized extreme value continuous random variable.
gausshyper A Gauss hypergeometric continuous random variable.
gamma A gamma continuous random variable.
gengamma A generalized gamma continuous random variable.
genhalflogistic A generalized half-logistic continuous random variable.
gilbrat A Gilbrat continuous random variable.
gompertz A Gompertz (or truncated Gumbel) continuous random variable.
gumbel_r A right-skewed Gumbel continuous random variable.
gumbel_l A left-skewed Gumbel continuous random variable.
halfcauchy A Half-Cauchy continuous random variable.
halflogistic A half-logistic continuous random variable.
halfnorm A half-normal continuous random variable.
halfgennorm The upper half of a generalized normal continuous random variable.
hypsecant A hyperbolic secant continuous random variable.
invgamma An inverted gamma continuous random variable.
invgauss An inverse Gaussian continuous random variable.
invweibull An inverted Weibull continuous random variable.
johnsonsb A Johnson SB continuous random variable.
johnsonsu A Johnson SU continuous random variable.
ksone General Kolmogorov-Smirnov one-sided test.
kstwobign Kolmogorov-Smirnov two-sided test for large N.
laplace A Laplace continuous random variable.
logistic A logistic (or Sech-squared) continuous random variable.
loggamma A log gamma continuous random variable.
loglaplace A log-Laplace continuous random variable.
lognorm A lognormal continuous random variable.
lomax A Lomax (Pareto of the second kind) continuous random variable.
maxwell A Maxwell continuous random variable.
mielke A Mielke’s Beta-Kappa continuous random variable.
nakagami A Nakagami continuous random variable.
ncx2 A non-central chi-squared continuous random variable.
ncf A non-central F distribution continuous random variable.
nct A non-central Student’s T continuous random variable.
norm A normal continuous random variable.
pareto A Pareto continuous random variable.
pearson3 A pearson type III continuous random variable.
powerlaw A power-function continuous random variable.
powerlognorm A power log-normal continuous random variable.
powernorm A power normal continuous random variable.
rdist An R-distributed continuous random variable.
reciprocal A reciprocal continuous random variable.
rayleigh A Rayleigh continuous random variable.
rice A Rice continuous random variable.
recipinvgauss A reciprocal inverse Gaussian continuous random variable.
semicircular A semicircular continuous random variable.
t A Student’s T continuous random variable.
triang A triangular continuous random variable.
truncexpon A truncated exponential continuous random variable.
truncnorm A truncated normal continuous random variable.
tukeylambda A Tukey-Lamdba continuous random variable.
uniform A uniform continuous random variable.
vonmises A Von Mises continuous random variable.
wald A Wald continuous random variable.
weibull_min A Frechet right (or Weibull minimum) continuous random variable.
weibull_max A Frechet left (or Weibull maximum) continuous random variable.
wrapcauchy A wrapped Cauchy continuous random variable.

Multivariate distributions

multivariate_normal A multivariate normal random variable.
dirichlet A Dirichlet random variable.
wishart A Wishart random variable.
invwishart An inverse Wishart random variable.

Discrete distributions

bernoulli A Bernoulli discrete random variable.
binom A binomial discrete random variable.
boltzmann A Boltzmann (Truncated Discrete Exponential) random variable.
dlaplace A Laplacian discrete random variable.
geom A geometric discrete random variable.
hypergeom A hypergeometric discrete random variable.
logser A Logarithmic (Log-Series, Series) discrete random variable.
nbinom A negative binomial discrete random variable.
planck A Planck discrete exponential random variable.
poisson A Poisson discrete random variable.
randint A uniform discrete random variable.
skellam A Skellam discrete random variable.
zipf A Zipf discrete random variable.

Statistical functions

Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

describe(a[, axis, ddof]) Computes several descriptive statistics of the passed array.
gmean(a[, axis, dtype]) Compute the geometric mean along the specified axis.
hmean(a[, axis, dtype]) Calculates the harmonic mean along the specified axis.
kurtosis(a[, axis, fisher, bias]) Computes the kurtosis (Fisher or Pearson) of a dataset.
kurtosistest(a[, axis]) Tests whether a dataset has normal kurtosis This function tests the null hypothesis that the kurtosis of the population from which the sample was drawn is that of the normal distribution: kurtosis = 3(n-1)/(n+1).
mode(a[, axis]) Returns an array of the modal (most common) value in the passed array.
moment(a[, moment, axis]) Calculates the nth moment about the mean for a sample.
normaltest(a[, axis]) Tests whether a sample differs from a normal distribution.
skew(a[, axis, bias]) Computes the skewness of a data set.
skewtest(a[, axis]) Tests whether the skew is different from the normal distribution.
kstat(data[, n]) Return the nth k-statistic (1<=n<=4 so far).
kstatvar(data[, n]) Returns an unbiased estimator of the variance of the k-statistic.
tmean(a[, limits, inclusive]) Compute the trimmed mean.
tvar(a[, limits, inclusive]) Compute the trimmed variance This function computes the sample variance of an array of values, while ignoring values which are outside of given limits.
tmin(a[, lowerlimit, axis, inclusive]) Compute the trimmed minimum This function finds the miminum value of an array a along the specified axis, but only considering values greater than a specified lower limit.
tmax(a[, upperlimit, axis, inclusive]) Compute the trimmed maximum This function computes the maximum value of an array along a given axis, while ignoring values larger than a specified upper limit.
tstd(a[, limits, inclusive]) Compute the trimmed sample standard deviation This function finds the sample standard deviation of given values, ignoring values outside the given limits.
tsem(a[, limits, inclusive]) Compute the trimmed standard error of the mean.
nanmean(*args, **kwds) nanmean is deprecated!
nanstd(*args, **kwds) nanstd is deprecated!
nanmedian(*args, **kwds) nanmedian is deprecated!
variation(a[, axis]) Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
cumfreq(a[, numbins, defaultreallimits, weights]) Returns a cumulative frequency histogram, using the histogram function.
histogram2(*args, **kwds) histogram2 is deprecated!
histogram(a[, numbins, defaultlimits, ...]) Separates the range into several bins and returns the number of instances in each bin.
itemfreq(a) Returns a 2-D array of item frequencies.
percentileofscore(a, score[, kind]) The percentile rank of a score relative to a list of scores.
scoreatpercentile(a, per[, limit, ...]) Calculate the score at a given percentile of the input sequence.
relfreq(a[, numbins, defaultreallimits, weights]) Returns a relative frequency histogram, using the histogram function.
binned_statistic(x, values[, statistic, ...]) Compute a binned statistic for a set of data.
binned_statistic_2d(x, y, values[, ...]) Compute a bidimensional binned statistic for a set of data.
binned_statistic_dd(sample, values[, ...]) Compute a multidimensional binned statistic for a set of data.
obrientransform(*args) Computes the O’Brien transform on input data (any number of arrays).
signaltonoise(*args, **kwds) signaltonoise is deprecated!
bayes_mvs(data[, alpha]) Bayesian confidence intervals for the mean, var, and std.
mvsdist(data) ‘Frozen’ distributions for mean, variance, and standard deviation of data.
sem(a[, axis, ddof]) Calculates the standard error of the mean (or standard error of measurement) of the values in the input array.
zmap(scores, compare[, axis, ddof]) Calculates the relative z-scores.
zscore(a[, axis, ddof]) Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.
sigmaclip(a[, low, high]) Iterative sigma-clipping of array elements.
threshold(a[, threshmin, threshmax, newval]) Clip array to a given value.
trimboth(a, proportiontocut[, axis]) Slices off a proportion of items from both ends of an array.
trim1(a, proportiontocut[, tail]) Slices off a proportion of items from ONE end of the passed array distribution.
f_oneway(*args) Performs a 1-way ANOVA.
pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
spearmanr(a[, b, axis]) Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
pointbiserialr(x, y) Calculates a point biserial correlation coefficient and the associated p-value.
kendalltau(x, y[, initial_lexsort]) Calculates Kendall’s tau, a correlation measure for ordinal data.
linregress(x[, y]) Calculate a regression line This computes a least-squares regression for two sets of measurements.
theilslopes(y[, x, alpha]) Computes the Theil-Sen estimator for a set of points (x, y).
ttest_1samp(a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores.
ttest_ind(a, b[, axis, equal_var]) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
ttest_ind_from_stats(mean1, std1, nobs1, ...) T-test for means of two independent samples from descriptive statistics.
ttest_rel(a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
kstest(rvs, cdf[, args, N, alternative, mode]) Perform the Kolmogorov-Smirnov test for goodness of fit.
chisquare(f_obs[, f_exp, ddof, axis]) Calculates a one-way chi square test.
power_divergence(f_obs[, f_exp, ddof, axis, ...]) Cressie-Read power divergence statistic and goodness of fit test.
ks_2samp(data1, data2) Computes the Kolmogorov-Smirnov statistic on 2 samples.
mannwhitneyu(x, y[, use_continuity]) Computes the Mann-Whitney rank test on samples x and y.
tiecorrect(rankvals) Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests.
rankdata(a[, method]) Assign ranks to data, dealing with ties appropriately.
ranksums(x, y) Compute the Wilcoxon rank-sum statistic for two samples.
wilcoxon(x[, y, zero_method, correction]) Calculate the Wilcoxon signed-rank test.
kruskal(*args) Compute the Kruskal-Wallis H-test for independent samples The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal.
friedmanchisquare(*args) Computes the Friedman test for repeated measurements The Friedman test tests the null hypothesis that repeated measurements of the same individuals have the same distribution.
combine_pvalues(pvalues[, method, weights]) Methods for combining the p-values of independent tests bearing upon the same hypothesis.
ansari(x, y) Perform the Ansari-Bradley test for equal scale parameters The Ansari-Bradley test is a non-parametric test for the equality of the scale parameter of the distributions from which two samples were drawn.
bartlett(*args) Perform Bartlett’s test for equal variances Bartlett’s test tests the null hypothesis that all input samples are from populations with equal variances.
levene(*args, **kwds) Perform Levene test for equal variances.
shapiro(x[, a, reta]) Perform the Shapiro-Wilk test for normality.
anderson(x[, dist]) Anderson-Darling test for data coming from a particular distribution The Anderson-Darling test is a modification of the Kolmogorov- Smirnov test kstest for the null hypothesis that a sample is drawn from a population that follows a particular distribution.
anderson_ksamp(samples[, midrank]) The Anderson-Darling test for k-samples.
binom_test(x[, n, p]) Perform a test that the probability of success is p.
fligner(*args, **kwds) Perform Fligner’s test for equal variances.
median_test(*args, **kwds) Mood’s median test.
mood(x, y[, axis]) Perform Mood’s test for equal scale parameters.
boxcox(x[, lmbda, alpha]) Return a positive dataset transformed by a Box-Cox power transformation.
boxcox_normmax(x[, brack, method]) Compute optimal Box-Cox transform parameter for input data.
boxcox_llf(lmb, data) The boxcox log-likelihood function.
entropy(pk[, qk, base]) Calculate the entropy of a distribution for given probability values.

Circular statistical functions

circmean(samples[, high, low, axis]) Compute the circular mean for samples in a range.
circvar(samples[, high, low, axis]) Compute the circular variance for samples assumed to be in a range :Parameters: samples : array_like Input array.
circstd(samples[, high, low, axis]) Compute the circular standard deviation for samples assumed to be in the range [low to high].

Contingency table functions

chi2_contingency(observed[, correction, lambda_]) Chi-square test of independence of variables in a contingency table.
contingency.expected_freq(observed) Compute the expected frequencies from a contingency table.
contingency.margins(a) Return a list of the marginal sums of the array a.
fisher_exact(table[, alternative]) Performs a Fisher exact test on a 2x2 contingency table.

Plot-tests

ppcc_max(x[, brack, dist]) Returns the shape parameter that maximizes the probability plot correlation coefficient for the given data to a one-parameter family of distributions.
ppcc_plot(x, a, b[, dist, plot, N]) Calculate and optionally plot probability plot correlation coefficient.
probplot(x[, sparams, dist, fit, plot]) Calculate quantiles for a probability plot, and optionally show the plot.
boxcox_normplot(x, la, lb[, plot, N]) Compute parameters for a Box-Cox normality plot, optionally show it.

Masked statistics functions

Univariate and multivariate kernel density estimation (scipy.stats.kde)

gaussian_kde(dataset[, bw_method]) Representation of a kernel-density estimate using Gaussian kernels.

For many more stat related functions install the software R and the interface package rpy.