Statistical functions (scipy.stats)¶
This module contains a large number of probability distributions as well as a growing library of statistical functions.
Each univariate distribution is an instance of a subclass of rv_continuous (rv_discrete for discrete distributions):
rv_continuous([momtype, a, b, xtol, ...]) | A generic continuous random variable class meant for subclassing. |
rv_discrete([a, b, name, badvalue, ...]) | A generic discrete random variable class meant for subclassing. |
Continuous distributions¶
alpha | An alpha continuous random variable. |
anglit | An anglit continuous random variable. |
arcsine | An arcsine continuous random variable. |
beta | A beta continuous random variable. |
betaprime | A beta prime continuous random variable. |
bradford | A Bradford continuous random variable. |
burr | A Burr (Type III) continuous random variable. |
cauchy | A Cauchy continuous random variable. |
chi | A chi continuous random variable. |
chi2 | A chi-squared continuous random variable. |
cosine | A cosine continuous random variable. |
dgamma | A double gamma continuous random variable. |
dweibull | A double Weibull continuous random variable. |
erlang | An Erlang continuous random variable. |
expon | An exponential continuous random variable. |
exponnorm | An exponentially modified Normal continuous random variable. |
exponweib | An exponentiated Weibull continuous random variable. |
exponpow | An exponential power continuous random variable. |
f | An F continuous random variable. |
fatiguelife | A fatigue-life (Birnbaum-Saunders) continuous random variable. |
fisk | A Fisk continuous random variable. |
foldcauchy | A folded Cauchy continuous random variable. |
foldnorm | A folded normal continuous random variable. |
frechet_r | A Frechet right (or Weibull minimum) continuous random variable. |
frechet_l | A Frechet left (or Weibull maximum) continuous random variable. |
genlogistic | A generalized logistic continuous random variable. |
gennorm | A generalized normal continuous random variable. |
genpareto | A generalized Pareto continuous random variable. |
genexpon | A generalized exponential continuous random variable. |
genextreme | A generalized extreme value continuous random variable. |
gausshyper | A Gauss hypergeometric continuous random variable. |
gamma | A gamma continuous random variable. |
gengamma | A generalized gamma continuous random variable. |
genhalflogistic | A generalized half-logistic continuous random variable. |
gilbrat | A Gilbrat continuous random variable. |
gompertz | A Gompertz (or truncated Gumbel) continuous random variable. |
gumbel_r | A right-skewed Gumbel continuous random variable. |
gumbel_l | A left-skewed Gumbel continuous random variable. |
halfcauchy | A Half-Cauchy continuous random variable. |
halflogistic | A half-logistic continuous random variable. |
halfnorm | A half-normal continuous random variable. |
halfgennorm | The upper half of a generalized normal continuous random variable. |
hypsecant | A hyperbolic secant continuous random variable. |
invgamma | An inverted gamma continuous random variable. |
invgauss | An inverse Gaussian continuous random variable. |
invweibull | An inverted Weibull continuous random variable. |
johnsonsb | A Johnson SB continuous random variable. |
johnsonsu | A Johnson SU continuous random variable. |
ksone | General Kolmogorov-Smirnov one-sided test. |
kstwobign | Kolmogorov-Smirnov two-sided test for large N. |
laplace | A Laplace continuous random variable. |
levy | A Levy continuous random variable. |
levy_l | A left-skewed Levy continuous random variable. |
levy_stable | A Levy-stable continuous random variable. |
logistic | A logistic (or Sech-squared) continuous random variable. |
loggamma | A log gamma continuous random variable. |
loglaplace | A log-Laplace continuous random variable. |
lognorm | A lognormal continuous random variable. |
lomax | A Lomax (Pareto of the second kind) continuous random variable. |
maxwell | A Maxwell continuous random variable. |
mielke | A Mielke’s Beta-Kappa continuous random variable. |
nakagami | A Nakagami continuous random variable. |
ncx2 | A non-central chi-squared continuous random variable. |
ncf | A non-central F distribution continuous random variable. |
nct | A non-central Student’s T continuous random variable. |
norm | A normal continuous random variable. |
pareto | A Pareto continuous random variable. |
pearson3 | A pearson type III continuous random variable. |
powerlaw | A power-function continuous random variable. |
powerlognorm | A power log-normal continuous random variable. |
powernorm | A power normal continuous random variable. |
rdist | An R-distributed continuous random variable. |
reciprocal | A reciprocal continuous random variable. |
rayleigh | A Rayleigh continuous random variable. |
rice | A Rice continuous random variable. |
recipinvgauss | A reciprocal inverse Gaussian continuous random variable. |
semicircular | A semicircular continuous random variable. |
t | A Student’s T continuous random variable. |
triang | A triangular continuous random variable. |
truncexpon | A truncated exponential continuous random variable. |
truncnorm | A truncated normal continuous random variable. |
tukeylambda | A Tukey-Lamdba continuous random variable. |
uniform | A uniform continuous random variable. |
vonmises | A Von Mises continuous random variable. |
vonmises_line | A Von Mises continuous random variable. |
wald | A Wald continuous random variable. |
weibull_min | A Frechet right (or Weibull minimum) continuous random variable. |
weibull_max | A Frechet left (or Weibull maximum) continuous random variable. |
wrapcauchy | A wrapped Cauchy continuous random variable. |
Multivariate distributions¶
multivariate_normal | A multivariate normal random variable. |
matrix_normal | A matrix normal random variable. |
dirichlet | A Dirichlet random variable. |
wishart | A Wishart random variable. |
invwishart | An inverse Wishart random variable. |
Discrete distributions¶
bernoulli | A Bernoulli discrete random variable. |
binom | A binomial discrete random variable. |
boltzmann | A Boltzmann (Truncated Discrete Exponential) random variable. |
dlaplace | A Laplacian discrete random variable. |
geom | A geometric discrete random variable. |
hypergeom | A hypergeometric discrete random variable. |
logser | A Logarithmic (Log-Series, Series) discrete random variable. |
nbinom | A negative binomial discrete random variable. |
planck | A Planck discrete exponential random variable. |
poisson | A Poisson discrete random variable. |
randint | A uniform discrete random variable. |
skellam | A Skellam discrete random variable. |
zipf | A Zipf discrete random variable. |
Statistical functions¶
Several of these functions have a similar version in scipy.stats.mstats which work for masked arrays.
describe(a[, axis, ddof, bias, nan_policy]) | Computes several descriptive statistics of the passed array. |
gmean(a[, axis, dtype]) | Compute the geometric mean along the specified axis. |
hmean(a[, axis, dtype]) | Calculates the harmonic mean along the specified axis. |
kurtosis(a[, axis, fisher, bias, nan_policy]) | Computes the kurtosis (Fisher or Pearson) of a dataset. |
kurtosistest(a[, axis, nan_policy]) | Tests whether a dataset has normal kurtosis This function tests the null hypothesis that the kurtosis of the population from which the sample was drawn is that of the normal distribution: kurtosis = 3(n-1)/(n+1). |
mode(a[, axis, nan_policy]) | Returns an array of the modal (most common) value in the passed array. |
moment(a[, moment, axis, nan_policy]) | Calculates the nth moment about the mean for a sample. |
normaltest(a[, axis, nan_policy]) | Tests whether a sample differs from a normal distribution. |
skew(a[, axis, bias, nan_policy]) | Computes the skewness of a data set. |
skewtest(a[, axis, nan_policy]) | Tests whether the skew is different from the normal distribution. |
kstat(data[, n]) | Return the nth k-statistic (1<=n<=4 so far). |
kstatvar(data[, n]) | Returns an unbiased estimator of the variance of the k-statistic. |
tmean(a[, limits, inclusive, axis]) | Compute the trimmed mean. |
tvar(a[, limits, inclusive, axis, ddof]) | Compute the trimmed variance This function computes the sample variance of an array of values, while ignoring values which are outside of given limits. |
tmin(a[, lowerlimit, axis, inclusive, ...]) | Compute the trimmed minimum This function finds the miminum value of an array a along the specified axis, but only considering values greater than a specified lower limit. |
tmax(a[, upperlimit, axis, inclusive, ...]) | Compute the trimmed maximum This function computes the maximum value of an array along a given axis, while ignoring values larger than a specified upper limit. |
tstd(a[, limits, inclusive, axis, ddof]) | Compute the trimmed sample standard deviation This function finds the sample standard deviation of given values, ignoring values outside the given limits. |
tsem(a[, limits, inclusive, axis, ddof]) | Compute the trimmed standard error of the mean. |
nanmean(*args, **kwds) | nanmean is deprecated! |
nanstd(*args, **kwds) | nanstd is deprecated! |
nanmedian(*args, **kwds) | nanmedian is deprecated! |
variation(a[, axis, nan_policy]) | Computes the coefficient of variation, the ratio of the biased standard deviation to the mean. |
find_repeats(arr) | Find repeats and repeat counts. |
trim_mean(a, proportiontocut[, axis]) | Return mean of array after trimming distribution from both tails. |
cumfreq(a[, numbins, defaultreallimits, weights]) | Returns a cumulative frequency histogram, using the histogram function. |
histogram2(*args, **kwds) | histogram2 is deprecated! |
histogram(*args, **kwds) | histogram is deprecated! |
itemfreq(a) | Returns a 2-D array of item frequencies. |
percentileofscore(a, score[, kind]) | The percentile rank of a score relative to a list of scores. |
scoreatpercentile(a, per[, limit, ...]) | Calculate the score at a given percentile of the input sequence. |
relfreq(a[, numbins, defaultreallimits, weights]) | Returns a relative frequency histogram, using the histogram function. |
binned_statistic(x, values[, statistic, ...]) | Compute a binned statistic for one or more sets of data. |
binned_statistic_2d(x, y, values[, ...]) | Compute a bidimensional binned statistic for one or more sets of data. |
binned_statistic_dd(sample, values[, ...]) | Compute a multidimensional binned statistic for a set of data. |
obrientransform(*args) | Computes the O’Brien transform on input data (any number of arrays). |
signaltonoise(*args, **kwds) | signaltonoise is deprecated! |
bayes_mvs(data[, alpha]) | Bayesian confidence intervals for the mean, var, and std. |
mvsdist(data) | ‘Frozen’ distributions for mean, variance, and standard deviation of data. |
sem(a[, axis, ddof, nan_policy]) | Calculates the standard error of the mean (or standard error of measurement) of the values in the input array. |
zmap(scores, compare[, axis, ddof]) | Calculates the relative z-scores. |
zscore(a[, axis, ddof]) | Calculates the z score of each value in the sample, relative to the sample mean and standard deviation. |
sigmaclip(a[, low, high]) | Iterative sigma-clipping of array elements. |
threshold(*args, **kwds) | threshold is deprecated! |
trimboth(a, proportiontocut[, axis]) | Slices off a proportion of items from both ends of an array. |
trim1(a, proportiontocut[, tail, axis]) | Slices off a proportion from ONE end of the passed array distribution. |
f_oneway(*args) | Performs a 1-way ANOVA. |
pearsonr(x, y) | Calculates a Pearson correlation coefficient and the p-value for testing non-correlation. |
spearmanr(a[, b, axis, nan_policy]) | Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation. |
pointbiserialr(x, y) | Calculates a point biserial correlation coefficient and its p-value. |
kendalltau(x, y[, initial_lexsort, nan_policy]) | Calculates Kendall’s tau, a correlation measure for ordinal data. |
linregress(x[, y]) | Calculate a linear least-squares regression for two sets of measurements. |
theilslopes(y[, x, alpha]) | Computes the Theil-Sen estimator for a set of points (x, y). |
f_value(*args, **kwds) | f_value is deprecated! |
ttest_1samp(a, popmean[, axis, nan_policy]) | Calculates the T-test for the mean of ONE group of scores. |
ttest_ind(a, b[, axis, equal_var, nan_policy]) | Calculates the T-test for the means of TWO INDEPENDENT samples of scores. |
ttest_ind_from_stats(mean1, std1, nobs1, ...) | T-test for means of two independent samples from descriptive statistics. |
ttest_rel(a, b[, axis, nan_policy]) | Calculates the T-test on TWO RELATED samples of scores, a and b. |
kstest(rvs, cdf[, args, N, alternative, mode]) | Perform the Kolmogorov-Smirnov test for goodness of fit. |
chisquare(f_obs[, f_exp, ddof, axis]) | Calculates a one-way chi square test. |
power_divergence(f_obs[, f_exp, ddof, axis, ...]) | Cressie-Read power divergence statistic and goodness of fit test. |
ks_2samp(data1, data2) | Computes the Kolmogorov-Smirnov statistic on 2 samples. |
mannwhitneyu(x, y[, use_continuity, alternative]) | Computes the Mann-Whitney rank test on samples x and y. |
tiecorrect(rankvals) | Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests. |
rankdata(a[, method]) | Assign ranks to data, dealing with ties appropriately. |
ranksums(x, y) | Compute the Wilcoxon rank-sum statistic for two samples. |
wilcoxon(x[, y, zero_method, correction]) | Calculate the Wilcoxon signed-rank test. |
kruskal(*args, **kwargs) | Compute the Kruskal-Wallis H-test for independent samples The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. |
friedmanchisquare(*args) | Computes the Friedman test for repeated measurements The Friedman test tests the null hypothesis that repeated measurements of the same individuals have the same distribution. |
combine_pvalues(pvalues[, method, weights]) | Methods for combining the p-values of independent tests bearing upon the same hypothesis. |
ss(*args, **kwds) | ss is deprecated! |
square_of_sums(*args, **kwds) | square_of_sums is deprecated! |
jarque_bera(x) | Perform the Jarque-Bera goodness of fit test on sample data. |
ansari(x, y) | Perform the Ansari-Bradley test for equal scale parameters The Ansari-Bradley test is a non-parametric test for the equality of the scale parameter of the distributions from which two samples were drawn. |
bartlett(*args) | Perform Bartlett’s test for equal variances Bartlett’s test tests the null hypothesis that all input samples are from populations with equal variances. |
levene(*args, **kwds) | Perform Levene test for equal variances. |
shapiro(x[, a, reta]) | Perform the Shapiro-Wilk test for normality. |
anderson(x[, dist]) | Anderson-Darling test for data coming from a particular distribution The Anderson-Darling test is a modification of the Kolmogorov- Smirnov test kstest for the null hypothesis that a sample is drawn from a population that follows a particular distribution. |
anderson_ksamp(samples[, midrank]) | The Anderson-Darling test for k-samples. |
binom_test(x[, n, p, alternative]) | Perform a test that the probability of success is p. |
fligner(*args, **kwds) | Perform Fligner-Killeen test for equality of variance. |
median_test(*args, **kwds) | Mood’s median test. |
mood(x, y[, axis]) | Perform Mood’s test for equal scale parameters. |
boxcox(x[, lmbda, alpha]) | Return a positive dataset transformed by a Box-Cox power transformation. |
boxcox_normmax(x[, brack, method]) | Compute optimal Box-Cox transform parameter for input data. |
boxcox_llf(lmb, data) | The boxcox log-likelihood function. |
entropy(pk[, qk, base]) | Calculate the entropy of a distribution for given probability values. |
chisqprob(*args, **kwds) | chisqprob is deprecated! |
betai(*args, **kwds) | betai is deprecated! |
Circular statistical functions¶
circmean(samples[, high, low, axis]) | Compute the circular mean for samples in a range. |
circvar(samples[, high, low, axis]) | Compute the circular variance for samples assumed to be in a range :Parameters: samples : array_like Input array. |
circstd(samples[, high, low, axis]) | Compute the circular standard deviation for samples assumed to be in the range [low to high]. |
Contingency table functions¶
chi2_contingency(observed[, correction, lambda_]) | Chi-square test of independence of variables in a contingency table. |
contingency.expected_freq(observed) | Compute the expected frequencies from a contingency table. |
contingency.margins(a) | Return a list of the marginal sums of the array a. |
fisher_exact(table[, alternative]) | Performs a Fisher exact test on a 2x2 contingency table. |
Plot-tests¶
ppcc_max(x[, brack, dist]) | Calculate the shape parameter that maximizes the PPCC The probability plot correlation coefficient (PPCC) plot can be used to determine the optimal shape parameter for a one-parameter family of distributions. |
ppcc_plot(x, a, b[, dist, plot, N]) | Calculate and optionally plot probability plot correlation coefficient. |
probplot(x[, sparams, dist, fit, plot]) | Calculate quantiles for a probability plot, and optionally show the plot. |
boxcox_normplot(x, la, lb[, plot, N]) | Compute parameters for a Box-Cox normality plot, optionally show it. |
Masked statistics functions¶
- Statistical functions for masked arrays (scipy.stats.mstats)
- scipy.stats.mstats.argstoarray
- scipy.stats.mstats.betai
- scipy.stats.mstats.chisquare
- scipy.stats.mstats.count_tied_groups
- scipy.stats.mstats.describe
- scipy.stats.mstats.f_oneway
- scipy.stats.mstats.f_value_wilks_lambda
- scipy.stats.mstats.find_repeats
- scipy.stats.mstats.friedmanchisquare
- scipy.stats.mstats.kendalltau
- scipy.stats.mstats.kendalltau_seasonal
- scipy.stats.mstats.kruskalwallis
- scipy.stats.mstats.ks_twosamp
- scipy.stats.mstats.kurtosis
- scipy.stats.mstats.kurtosistest
- scipy.stats.mstats.linregress
- scipy.stats.mstats.mannwhitneyu
- scipy.stats.mstats.plotting_positions
- scipy.stats.mstats.mode
- scipy.stats.mstats.moment
- scipy.stats.mstats.mquantiles
- scipy.stats.mstats.msign
- scipy.stats.mstats.normaltest
- scipy.stats.mstats.obrientransform
- scipy.stats.mstats.pearsonr
- scipy.stats.mstats.plotting_positions
- scipy.stats.mstats.pointbiserialr
- scipy.stats.mstats.rankdata
- scipy.stats.mstats.scoreatpercentile
- scipy.stats.mstats.sem
- scipy.stats.mstats.signaltonoise
- scipy.stats.mstats.skew
- scipy.stats.mstats.skewtest
- scipy.stats.mstats.spearmanr
- scipy.stats.mstats.theilslopes
- scipy.stats.mstats.threshold
- scipy.stats.mstats.tmax
- scipy.stats.mstats.tmean
- scipy.stats.mstats.tmin
- scipy.stats.mstats.trim
- scipy.stats.mstats.trima
- scipy.stats.mstats.trimboth
- scipy.stats.mstats.trimmed_stde
- scipy.stats.mstats.trimr
- scipy.stats.mstats.trimtail
- scipy.stats.mstats.tsem
- scipy.stats.mstats.ttest_onesamp
- scipy.stats.mstats.ttest_ind
- scipy.stats.mstats.ttest_onesamp
- scipy.stats.mstats.ttest_rel
- scipy.stats.mstats.tvar
- scipy.stats.mstats.variation
- scipy.stats.mstats.winsorize
- scipy.stats.mstats.zmap
- scipy.stats.mstats.zscore
- scipy.stats.mstats.compare_medians_ms
- scipy.stats.mstats.gmean
- scipy.stats.mstats.hdmedian
- scipy.stats.mstats.hdquantiles
- scipy.stats.mstats.hdquantiles_sd
- scipy.stats.mstats.hmean
- scipy.stats.mstats.idealfourths
- scipy.stats.mstats.kruskal
- scipy.stats.mstats.ks_2samp
- scipy.stats.mstats.median_cihs
- scipy.stats.mstats.meppf
- scipy.stats.mstats.mjci
- scipy.stats.mstats.mquantiles_cimj
- scipy.stats.mstats.rsh
- scipy.stats.mstats.sen_seasonal_slopes
- scipy.stats.mstats.trimmed_mean
- scipy.stats.mstats.trimmed_mean_ci
- scipy.stats.mstats.trimmed_std
- scipy.stats.mstats.trimmed_var
- scipy.stats.mstats.ttest_1samp
Univariate and multivariate kernel density estimation (scipy.stats.kde)¶
gaussian_kde(dataset[, bw_method]) | Representation of a kernel-density estimate using Gaussian kernels. |
For many more stat related functions install the software R and the interface package rpy.