Statistical functions (scipy.stats)

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each included continuous distribution is an instance of the class rv_continous:

rv_continuous([momtype, a, b, xa, xb, xtol, ...]) A generic continuous random variable class meant for subclassing.
rv_continuous.pdf(x, *args, **kwds) Probability density function at x of the given RV.
rv_continuous.cdf(x, *args, **kwds) Cumulative distribution function at x of the given RV.
rv_continuous.sf(x, *args, **kwds) Survival function (1-cdf) at x of the given RV.
rv_continuous.ppf(q, *args, **kwds) Percent point function (inverse of cdf) at q of the given RV.
rv_continuous.isf(q, *args, **kwds) Inverse survival function at q of the given RV.
rv_continuous.stats(*args, **kwds) Some statistics of the given RV

Each discrete distribution is an instance of the class rv_discrete:

rv_discrete([a, b, name, badvalue, ...]) A generic discrete random variable class meant for subclassing.
rv_discrete.pmf(k, *args, **kwds) Probability mass function at k of the given RV.
rv_discrete.cdf(k, *args, **kwds) Cumulative distribution function at k of the given RV
rv_discrete.sf(k, *args, **kwds) Survival function (1-cdf) at k of the given RV
rv_discrete.ppf(q, *args, **kwds) Percent point function (inverse of cdf) at q of the given RV
rv_discrete.isf(q, *args, **kwds) Inverse survival function (1-sf) at q of the given RV
rv_discrete.stats(*args, **kwds) Some statistics of the given discrete RV

Continuous distributions

norm A normal continuous random variable.
alpha A alpha continuous random variable.
anglit A anglit continuous random variable.
arcsine A arcsine continuous random variable.
beta A beta continuous random variable.
betaprime A betaprime continuous random variable.
bradford A Bradford continuous random variable.
burr Burr continuous random variable.
fisk A funk continuous random variable.
cauchy Cauchy continuous random variable.
chi A chi continuous random variable.
chi2 A chi-squared continuous random variable.
cosine A cosine continuous random variable.
dgamma A double gamma continuous random variable.
dweibull A double Weibull continuous random variable.
erlang An Erlang continuous random variable.
expon An exponential continuous random variable.
exponweib An exponentiated Weibull continuous random variable.
exponpow An exponential power continuous random variable.
fatiguelife A fatigue-life (Birnbaum-Sanders) continuous random variable.
foldcauchy A folded Cauchy continuous random variable.
f An F continuous random variable.
foldnorm A folded normal continuous random variable.
genlogistic A generalized logistic continuous random variable.
genpareto A generalized Pareto continuous random variable.
genexpon A generalized exponential continuous random variable.
genextreme A generalized extreme value continuous random variable.
gausshyper A Gauss hypergeometric continuous random variable.
gamma A gamma continuous random variable.
gengamma A generalized gamma continuous random variable.
genhalflogistic A generalized half-logistic continuous random variable.
gompertz A Gompertz (truncated Gumbel) distribution continuous random variable.
gumbel_r A (right-skewed) Gumbel continuous random variable.
gumbel_l A left-skewed Gumbel continuous random variable.
halfcauchy A Half-Cauchy continuous random variable.
halflogistic A half-logistic continuous random variable.
halfnorm A half-normal continuous random variable.
hypsecant A hyperbolic secant continuous random variable.
invgamma An inverted gamma continuous random variable.
invnorm An inverse normal continuous random variable.
invweibull An inverted Weibull continuous random variable.
johnsonsb A Johnson SB continuous random variable.
johnsonsu A Johnson SU continuous random variable.
laplace A Laplace continuous random variable.
logistic A logistic continuous random variable.
loggamma A log gamma continuous random variable.
loglaplace A log-Laplace continuous random variable.
lognorm A lognormal continuous random variable.
gilbrat A Gilbrat continuous random variable.
lomax A Lomax (Pareto of the second kind) continuous random variable.
maxwell A Maxwell continuous random variable.
mielke A Mielke’s Beta-Kappa continuous random variable.
nakagami A Nakagami continuous random variable.
ncx2 A non-central chi-squared continuous random variable.
ncf A non-central F distribution continuous random variable.
t Student’s T continuous random variable.
nct A Noncentral T continuous random variable.
pareto A Pareto continuous random variable.
powerlaw A power-function continuous random variable.
powerlognorm A power log-normal continuous random variable.
powernorm A power normal continuous random variable.
rdist An R-distributed continuous random variable.
reciprocal A reciprocal continuous random variable.
rayleigh A Rayleigh continuous random variable.
rice A Rice continuous random variable.
recipinvgauss A reciprocal inverse Gaussian continuous random variable.
semicircular A semicircular continuous random variable.
triang A Triangular continuous random variable.
truncexpon A truncated exponential continuous random variable.
truncnorm A truncated normal continuous random variable.
tukeylambda A Tukey-Lambda continuous random variable.
uniform A uniform continuous random variable.
wald A Wald continuous random variable.
weibull_min A Weibull minimum continuous random variable.
weibull_max A Weibull maximum continuous random variable.
wrapcauchy A wrapped Cauchy continuous random variable.
ksone Kolmogorov-Smirnov A one-sided test statistic. continuous random variable.
kstwobign Kolmogorov-Smirnov two-sided (for large N) continuous random variable.

Discrete distributions

binom None discrete random variable.
bernoulli None discrete random variable.
nbinom A negative binomial discrete random variable.
geom A geometric discrete random variable.
hypergeom A hypergeometric discrete random variable.
logser A logarithmic discrete random variable.
poisson A Poisson discrete random variable.
planck A discrete exponential discrete random variable.
boltzmann A truncated discrete exponential discrete random variable.
randint A discrete uniform (random integer) discrete random variable.
zipf A Zipf discrete random variable.
dlaplace A discrete Laplacian discrete random variable.

Statistical functions

Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

gmean(a[, axis, dtype]) Compute the geometric mean along the specified axis.
hmean(a[, axis, dtype]) Calculates the harmonic mean along the specified axis.
mean(a[, axis]) Returns the arithmetic mean of m along the given dimension.
cmedian(a[, numbins]) Returns the computed median value of an array.
median(a[, axis]) Returns the median of the passed array along the given axis.
mode(a[, axis]) Returns an array of the modal (most common) value in the passed array.
tmean(a[, limits, inclusive]) Compute the trimmed mean
tvar(a[, limits, inclusive]) Compute the trimmed variance
tmin(a[, lowerlimit, axis, inclusive]) Compute the trimmed minimum
tmax(a, upperlimit[, axis, inclusive]) Compute the trimmed maximum
tstd(a[, limits, inclusive]) Compute the trimmed sample standard deviation
tsem(a[, limits, inclusive]) Compute the trimmed standard error of the mean
moment(a[, moment, axis]) Calculates the nth moment about the mean for a sample.
variation(a[, axis]) Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
skew(a[, axis, bias]) Computes the skewness of a data set.
kurtosis(a[, axis, fisher, bias]) Computes the kurtosis (Fisher or Pearson) of a dataset.
describe(a[, axis]) Computes several descriptive statistics of the passed array.
skewtest(a[, axis]) Tests whether the skew is different from the normal distribution.
kurtosistest(a[, axis]) Tests whether a dataset has normal kurtosis
normaltest(a[, axis]) Tests whether a sample differs from a normal distribution
itemfreq(a) Returns a 2D array of item frequencies.
scoreatpercentile(a, per[, limit]) Calculate the score at the given ‘per’ percentile of the sequence a.
percentileofscore(a, score[, kind]) The percentile rank of a score relative to a list of scores.
histogram2(a, bins) histogram2(a,bins) – Compute histogram of a using divisions in bins
histogram(a[, numbins, defaultlimits, ...]) Separates the range into several bins and returns the number of instances of a in each bin.
cumfreq(a[, numbins, defaultreallimits, weights]) Returns a cumulative frequency histogram, using the histogram function.
relfreq(a[, numbins, defaultreallimits, weights]) Returns a relative frequency histogram, using the histogram function.
obrientransform(*args) Computes a transform on input data (any number of columns).
samplevar(*args, **kwds) samplevar is deprecated!
samplestd(*args, **kwds) samplestd is deprecated!
signaltonoise(a[, axis, ddof]) Calculates the signal-to-noise ratio, defined as the ratio between the mean and the standard deviation.
bayes_mvs(data[, alpha]) Return Bayesian confidence intervals for the mean, var, and std.
var(a[, axis, bias]) Returns the estimated population variance of the values in the passed
std(a[, axis, bias]) Returns the estimated population standard deviation of the values in
stderr(*args, **kwds) stderr is deprecated!
sem(a[, axis, ddof]) Calculates the standard error of the mean (or standard error of measurement) of the values in the passed array.
z(*args, **kwds) z is deprecated!
zs(*args, **kwds) zs is deprecated!
zmap(scores, compare[, axis]) Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to array passed to compare (e.g., [time,x,y]).
threshold(a[, threshmin, threshmax, newval]) Clip array to a given value.
trimboth(a, proportiontocut) Slices off the passed proportion of items from BOTH ends of the passed
trim1(a, proportiontocut[, tail]) Slices off the passed proportion of items from ONE end of the passed
cov(m[, y, rowvar, bias]) Estimate the covariance matrix.
corrcoef(x[, y, rowvar, bias]) The correlation coefficients formed from 2-d array x, where the rows are the observations, and the columns are variables.
f_oneway(*args) Performs a 1-way ANOVA.
pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing
spearmanr(a[, b, axis]) Calculates a Spearman rank-order correlation coefficient and the p-value
pointbiserialr(x, y) Calculates a point biserial correlation coefficient and the associated p-value.
kendalltau(x, y) Calculates Kendall’s tau, a correlation measure for ordinal data
linregress(*args) Calculate a regression line
ttest_1samp(a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores a.
ttest_ind(a, b[, axis]) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
ttest_rel(a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
kstest(rvs, cdf, **kwds[, args, N, ...]) Perform the Kolmogorov-Smirnov test for goodness of fit
chisquare(f_obs[, f_exp, ddof]) Calculates a one-way chi square test.
ks_2samp(data1, data2) Computes the Kolmogorov-Smirnof statistic on 2 samples.
mannwhitneyu(x, y[, use_continuity]) Computes the Mann-Whitney rank test on samples x and y.
tiecorrect(rankvals) Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
ranksums(x, y) Compute the Wilcoxon rank-sum statistic for two samples.
wilcoxon(x[, y]) Calculate the Wilcoxon signed-rank test
kruskal(*args) Compute the Kruskal-Wallis H-test for independent samples
friedmanchisquare(*args) Computes the Friedman test for repeated measurements
ansari(x, y) Perform the Ansari-Bradley test for equal scale parameters
bartlett(*args) Perform Bartlett’s test for equal variances
levene(*args, **kwds) Perform Levene test for equal variances
shapiro(x[, a, reta]) Perform the Shapiro-Wilk test for normality.
anderson(x[, dist]) Anderson-Darling test for data coming from a particular distribution
binom_test(x[, n, p]) Perform a test that the probability of success is p.
fligner(*args, **kwds) Perform Fligner’s test for equal variances
mood(x, y) Perform Mood’s test for equal scale parameters
oneway(*args, **kwds) Test for equal means in two or more samples from the normal distribution.
glm(data, para) Calculates a linear model fit ...


probplot(x[, sparams, dist, fit, plot]) Return (osm, osr){,(scale,loc,r)} where (osm, osr) are order statistic medians and ordered response data respectively so that plot(osm, osr) is a probability plot.
ppcc_max(x[, brack, dist]) Returns the shape parameter that maximizes the probability plot correlation coefficient for the given data to a one-parameter family of distributions.
ppcc_plot(x, a, b[, dist, plot, N]) Returns (shape, ppcc), and optionally plots shape vs.

Masked statistics functions

Univariate and multivariate kernel density estimation (scipy.stats.kde)

gaussian_kde(dataset) Representation of a kernel-density estimate using Gaussian kernels.

For many more stat related functions install the software R and the interface package rpy.