Statistical functions (scipy.stats)

This module contains a large number of probability distributions as well as a growing library of statistical functions.

Each included continuous distribution is an instance of the class rv_continous:

rv_continuous A Generic continuous random variable.
rv_continuous.pdf (self, x, *args, **kwds) Probability density function at x of the given RV.
rv_continuous.cdf (self, x, *args, **kwds) Cumulative distribution function at x of the given RV.
rv_continuous.sf (self, x, *args, **kwds) Survival function (1-cdf) at x of the given RV.
rv_continuous.ppf (self, q, *args, **kwds) Percent point function (inverse of cdf) at q of the given RV.
rv_continuous.isf (self, q, *args, **kwds) Inverse survival function at q of the given RV.
rv_continuous.stats (self, *args, **kwds) Some statistics of the given RV

Each discrete distribution is an instance of the class rv_discrete:

rv_discrete A Generic discrete random variable.
rv_discrete.pmf (self, k, *args, **kwds) Probability mass function at k of the given RV.
rv_discrete.cdf (self, k, *args, **kwds) Cumulative distribution function at k of the given RV
rv_discrete.sf (self, k, *args, **kwds) Survival function (1-cdf) at k of the given RV
rv_discrete.ppf (self, q, *args, **kwds) Percent point function (inverse of cdf) at q of the given RV
rv_discrete.isf (self, q, *args, **kwds) Inverse survival function (1-sf) at q of the given RV
rv_discrete.stats (self, *args, **kwds) Some statistics of the given discrete RV

Continuous distributions

norm () A normal continuous random variable.
alpha () A alpha continuous random variable.
anglit () A anglit continuous random variable.
arcsine () A arcsine continuous random variable.
beta () A beta continuous random variable.
betaprime () A betaprime continuous random variable.
bradford () A Bradford continuous random variable.
burr () Burr continuous random variable.
fisk () A funk continuous random variable.
cauchy () Cauchy continuous random variable.
chi () A chi continuous random variable.
chi2 () A chi-squared continuous random variable.
cosine () A cosine continuous random variable.
dgamma () A double gamma continuous random variable.
dweibull () A double Weibull continuous random variable.
erlang () An Erlang continuous random variable.
expon () An exponential continuous random variable.
exponweib () An exponentiated Weibull continuous random variable.
exponpow () An exponential power continuous random variable.
fatiguelife () A fatigue-life (Birnbaum-Sanders) continuous random variable.
foldcauchy () A folded Cauchy continuous random variable.
f () An F continuous random variable.
foldnorm () A folded normal continuous random variable.
fretchet_r
fretcher_l
genlogistic () A generalized logistic continuous random variable.
genpareto () A generalized Pareto continuous random variable.
genexpon () A generalized exponential continuous random variable.
genextreme () A generalized extreme value continuous random variable.
gausshyper () A Gauss hypergeometric continuous random variable.
gamma () A gamma continuous random variable.
gengamma () A generalized gamma continuous random variable.
genhalflogistic () A generalized half-logistic continuous random variable.
gompertz () A Gompertz (truncated Gumbel) distribution continuous random variable.
gumbel_r () A (right-skewed) Gumbel continuous random variable.
gumbel_l () A left-skewed Gumbel continuous random variable.
halfcauchy () A Half-Cauchy continuous random variable.
halflogistic () A half-logistic continuous random variable.
halfnorm () A half-normal continuous random variable.
hypsecant () A hyperbolic secant continuous random variable.
invgamma () An inverted gamma continuous random variable.
invnorm () An inverse normal continuous random variable.
invweibull () An inverted Weibull continuous random variable.
johnsonsb () A Johnson SB continuous random variable.
johnsonsu () A Johnson SU continuous random variable.
laplace () A Laplace continuous random variable.
logistic () A logistic continuous random variable.
loggamma () A log gamma continuous random variable.
loglaplace () A log-Laplace continuous random variable.
lognorm () A lognormal continuous random variable.
gilbrat () A Gilbrat continuous random variable.
lomax () A Lomax (Pareto of the second kind) continuous random variable.
maxwell () A Maxwell continuous random variable.
mielke () A Mielke’s Beta-Kappa continuous random variable.
nakagami () A Nakagami continuous random variable.
ncx2 () A non-central chi-squared continuous random variable.
ncf () A non-central F distribution continuous random variable.
t () Student’s T continuous random variable.
nct () A Noncentral T continuous random variable.
pareto () A Pareto continuous random variable.
powerlaw () A power-function continuous random variable.
powerlognorm () A power log-normal continuous random variable.
powernorm () A power normal continuous random variable.
rdist () An R-distributed continuous random variable.
reciprocal () A reciprocal continuous random variable.
rayleigh () A Rayleigh continuous random variable.
rice () A Rice continuous random variable.
recipinvgauss () A reciprocal inverse Gaussian continuous random variable.
semicircular () A semicircular continuous random variable.
triang () A Triangular continuous random variable.
truncexpon () A truncated exponential continuous random variable.
truncnorm () A truncated normal continuous random variable.
tukeylambda () A Tukey-Lambda continuous random variable.
uniform () A uniform continuous random variable.
von_mises
wald () A Wald continuous random variable.
weibull_min () A Weibull minimum continuous random variable.
weibull_max () A Weibull maximum continuous random variable.
wrapcauchy () A wrapped Cauchy continuous random variable.
ksone () Kolmogorov-Smirnov A one-sided test statistic. continuous random variable.
kstwobign () Kolmogorov-Smirnov two-sided (for large N) continuous random variable.

Discrete distributions

binom () A binom discrete random variable.
bernoulli () A bernoulli discrete random variable.
nbinom () A negative binomial discrete random variable.
geom () A geometric discrete random variable.
hypergeom () A hypergeometric discrete random variable.
logser () A logarithmic discrete random variable.
poisson () A Poisson discrete random variable.
planck () A discrete exponential discrete random variable.
boltzmann () A truncated discrete exponential discrete random variable.
randint () A discrete uniform (random integer) discrete random variable.
zipf () A Zipf discrete random variable.
dlaplace () A discrete Laplacian discrete random variable.

Statistical functions

Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.

gmean (a[, axis]) Calculates the geometric mean of the values in the passed array.
hmean (a[, axis, zero_sub]) Calculates the harmonic mean of the values in the passed array.
mean (a[, axis]) Returns the arithmetic mean of m along the given dimension.
cmedian (a[, numbins]) Returns the computed median value of an array.
median (a[, axis]) Returns the median of the passed array along the given axis.
mode (a[, axis]) Returns an array of the modal (most common) value in the passed array.
tmean (a[, limits, inclusive, True)) Returns the arithmetic mean of all values in an array, ignoring values strictly outside given limits.
tvar (a[, limits, inclusive, 1)) Returns the sample variance of values in an array, (i.e., using N-1), ignoring values strictly outside the sequence passed to ‘limits’. Note: either limit in the sequence, or the value of limits itself, can be set to None. The inclusive list/tuple determines whether the lower and upper limiting bounds (respectively) are open/exclusive (0) or closed/inclusive (1).
tmin (a[, lowerlimit, axis, ...]) Returns the minimum value of a, along axis, including only values less than (or equal to, if inclusive is True) lowerlimit. If the limit is set to None, all values in the array are used.
tmax (a, upperlimit[, axis, inclusive]) Returns the maximum value of a, along axis, including only values greater than (or equal to, if inclusive is True) upperlimit. If the limit is set to None, a limit larger than the max value in the array is used.
tstd (a[, limits, inclusive, 1)) Returns the standard deviation of all values in an array, ignoring values strictly outside the sequence passed to ‘limits’. Note: either limit in the sequence, or the value of limits itself, can be set to None. The inclusive list/tuple determines whether the lower and upper limiting bounds (respectively) are open/exclusive (0) or closed/inclusive (1).
tsem (a[, limits, inclusive, True)) Returns the standard error of the mean for the values in an array, (i.e., using N for the denominator), ignoring values strictly outside the sequence passed to ‘limits’. Note: either limit in the sequence, or the value of limits itself, can be set to None. The inclusive list/tuple determines whether the lower and upper limiting bounds (respectively) are open/exclusive (0) or closed/inclusive (1).
moment (a[, moment, axis]) Calculates the nth moment about the mean for a sample.
variation (a[, axis]) Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
skew (a[, axis, bias]) Computes the skewness of a data set.
kurtosis (a[, axis, fisher, bias]) Computes the kurtosis (Fisher or Pearson) of a dataset.
describe (a[, axis]) Computes several descriptive statistics of the passed array.
skewtest (a[, axis]) Tests whether the skew is significantly different from a normal distribution.
kurtosistest (a[, axis]) Tests whether a dataset has normal kurtosis (i.e., kurtosis=3(n-1)/(n+1)).
normaltest (a[, axis]) Tests whether skew and/or kurtosis of dataset differs from normal curve.
itemfreq (a) Returns a 2D array of item frequencies.
scoreatpercentile (a, per[, limit=()) Calculate the score at the given ‘per’ percentile of the sequence a. For example, the score at per=50 is the median.
percentileofscore (a, score[, kind]) The percentile rank of a score relative to a list of scores.
histogram2 (a, bins) histogram2(a,bins) – Compute histogram of a using divisions in bins
histogram (a[, numbins, defaultlimits, ...]) Returns (i) an array of histogram bin counts, (ii) the smallest value of the histogram binning, and (iii) the bin width (the last 2 are not necessarily integers). Default number of bins is 10. Defaultlimits can be None (the routine picks bins spanning all the numbers in the a) or a 2-sequence (lowerlimit, upperlimit). Returns all of the following: array of bin values, lowerreallimit, binsize, extrapoints.
cumfreq (a[, numbins, defaultreallimits]) Returns a cumulative frequency histogram, using the histogram function. Defaultreallimits can be None (use all data), or a 2-sequence containing lower and upper limits on values to include.
relfreq (a[, numbins, defaultreallimits]) Returns a relative frequency histogram, using the histogram function. Defaultreallimits can be None (use all data), or a 2-sequence containing lower and upper limits on values to include.
obrientransform (*args) Computes a transform on input data (any number of columns). Used to test for homogeneity of variance prior to running one-way stats. Each array in *args is one level of a factor. If an F_oneway() run on the transformed data and found significant, variances are unequal. From Maxwell and Delaney, p.112.
samplevar (a[, axis]) Returns the sample standard deviation of the values in the passed array (i.e., using N). Axis can equal None (ravel array first), an integer (the axis over which to operate)
samplestd (a[, axis]) Returns the sample standard deviation of the values in the passed array (i.e., using N). Axis can equal None (ravel array first), an integer (the axis over which to operate).
signaltonoise (instack[, axis]) Calculates signal-to-noise. Axis can equal None (ravel array first), an integer (the axis over which to operate).
bayes_mvs (data[, alpha]) Return Bayesian confidence intervals for the mean, var, and std.
var (a[, axis, bias]) Returns the estimated population variance of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
std (a[, axis, bias]) Returns the estimated population standard deviation of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
stderr (a[, axis]) Returns the estimated population standard error of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
sem (a[, axis]) Returns the standard error of the mean (i.e., using N) of the values in the passed array. Axis can equal None (ravel array first), or an integer (the axis over which to operate)
z (a, score) Returns the z-score of a given input score, given thearray from which that score came. Not appropriate for population calculations, nor for arrays > 1D.
zs (a) Returns a 1D array of z-scores, one for each score in the passed array, computed relative to the passed array.
zmap (scores, compare[, axis]) Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 of the compare array.
threshold (a[, threshmin, threshmax, ...]) Clip array to a given value.
trimboth (a, proportiontocut) Slices off the passed proportion of items from BOTH ends of the passed array (i.e., with proportiontocut=0.1, slices ‘leftmost’ 10% AND ‘rightmost’ 10% of scores. You must pre-sort the array if you want “proper” trimming. Slices off LESS if proportion results in a non-integer slice index (i.e., conservatively slices off proportiontocut).
trim1 (a, proportiontocut[, tail]) Slices off the passed proportion of items from ONE end of the passed array (i.e., if proportiontocut=0.1, slices off ‘leftmost’ or ‘rightmost’ 10% of scores). Slices off LESS if proportion results in a non-integer slice index (i.e., conservatively slices off proportiontocut).
cov (m[, y, rowvar, bias]) Estimate the covariance matrix.
corrcoef (x[, y, rowvar, bias]) The correlation coefficients formed from 2-d array x, where the rows are the observations, and the columns are variables.
f_oneway (*args) Performs a 1-way ANOVA, returning an F-value and probability given any number of groups. From Heiman, pp.394-7.
paired
pearsonr (x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
spearmanr (x, y) Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
pointbiserialr (x, y) Calculates a point biserial correlation coefficient and the associated p-value.
kendalltau (x, y) Calculates Kendall’s tau, a correlation measure for ordinal data, and an associated p-value.
linregress (*args) Calculates a regression line on two arrays, x and y, corresponding to x,y pairs. If a single 2D array is passed, linregress finds dim with 2 levels and splits data into x,y pairs along that dim.
ttest_1samp (a, popmean[, axis]) Calculates the T-test for the mean of ONE group of scores a.
ttest_ind (a, b[, axis]) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
ttest_rel (a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
kstest (rvs, cdf[, args=(), N, alternative, mode, **kwds) Return the D-value and the p-value for a Kolmogorov-Smirnov test
chisquare (f_obs[, f_exp]) Calculates a one-way chi square for array of observed frequencies and returns the result. If no expected frequencies are given, the total N is assumed to be equally distributed across all groups.
ks_2samp (data1, data2) Computes the Kolmogorov-Smirnof statistic on 2 samples.
meanwhitneyu
tiecorrect (rankvals) Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. See Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
ranksums (x, y) Calculates the rank sums statistic on the provided scores and returns the result.
wilcoxon (x[, y]) Calculates the Wilcoxon signed-rank test for the null hypothesis that two samples come from the same distribution. A non-parametric T-test. (need N > 20)
kruskal (*args) The Kruskal-Wallis H-test is a non-parametric ANOVA for 2 or more groups, requiring at least 5 subjects in each group. This function calculates the Kruskal-Wallis H and associated p-value for 2 or more independent samples.
friedmanchisquare (*args) Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA. This function calculates the Friedman Chi-square test for repeated measures and returns the result, along with the associated probability value.
ansari (x, y) Determine if the scale parameter for two distributions with equal medians is the same using the Ansari-Bradley statistic.
bartlett (*args) Perform Bartlett test with the null hypothesis that all input samples have equal variances.
levene (*args, **kwds) Perform Levene test with the null hypothesis that all input samples have equal variances.
shapiro (x[, a, reta]) Shapiro and Wilk test for normality.
anderson (x[, dist]) Anderson and Darling test for normal, exponential, or Gumbel (Extreme Value Type I) distribution.
binom_test (x[, n, p]) An exact (two-sided) test of the null hypothesis that the probability of success in a Bernoulli experiment is p.
fligner (*args, **kwds) Perform Levene test with the null hypothesis that all input samples have equal variances.
mood (x, y) Determine if the scale parameter for two distributions with equal medians is the same using a Mood test.
oneway (*args, **kwds) Test for equal means in two or more samples from the normal distribution.
glm (data, para) Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken from:
anova

Plot-tests

probplot (x[, sparams=(), dist, ...]) Return (osm, osr){,(scale,loc,r)} where (osm, osr) are order statistic medians and ordered response data respectively so that plot(osm, osr) is a probability plot. If fit==1, then do a regression fit and compute the slope (scale), intercept (loc), and correlation coefficient (r), of the best straight line through the points. If fit==0, only (osm, osr) is returned.
ppcc_max (x[, brack, 1.0), dist]) Returns the shape parameter that maximizes the probability plot correlation coefficient for the given data to a one-parameter family of distributions.
ppcc_plot (x, a, b[, dist, plot, N]) Returns (shape, ppcc), and optionally plots shape vs. ppcc (probability plot correlation coefficient) as a function of shape parameter for a one-parameter family of distributions from shape value a to b.

Univariate and multivariate kernel density estimation (scipy.stats.kde)

gaussian_kde Representation of a kernel-density estimate using Gaussian kernels.

For many more stat related functions install the software R and the interface package rpy.