Statistical functions (scipy.stats
)¶
This module contains a large number of probability distributions as well as a growing library of statistical functions.
Each univariate distribution is an instance of a subclass of rv_continuous
(rv_discrete
for discrete distributions):
rv_continuous ([momtype, a, b, xtol, …]) |
A generic continuous random variable class meant for subclassing. |
rv_discrete ([a, b, name, badvalue, …]) |
A generic discrete random variable class meant for subclassing. |
rv_histogram (histogram, *args, **kwargs) |
Generates a distribution given by a histogram. |
Continuous distributions¶
alpha |
An alpha continuous random variable. |
anglit |
An anglit continuous random variable. |
arcsine |
An arcsine continuous random variable. |
argus |
Argus distribution |
beta |
A beta continuous random variable. |
betaprime |
A beta prime continuous random variable. |
bradford |
A Bradford continuous random variable. |
burr |
A Burr (Type III) continuous random variable. |
burr12 |
A Burr (Type XII) continuous random variable. |
cauchy |
A Cauchy continuous random variable. |
chi |
A chi continuous random variable. |
chi2 |
A chi-squared continuous random variable. |
cosine |
A cosine continuous random variable. |
crystalball |
Crystalball distribution |
dgamma |
A double gamma continuous random variable. |
dweibull |
A double Weibull continuous random variable. |
erlang |
An Erlang continuous random variable. |
expon |
An exponential continuous random variable. |
exponnorm |
An exponentially modified Normal continuous random variable. |
exponweib |
An exponentiated Weibull continuous random variable. |
exponpow |
An exponential power continuous random variable. |
f |
An F continuous random variable. |
fatiguelife |
A fatigue-life (Birnbaum-Saunders) continuous random variable. |
fisk |
A Fisk continuous random variable. |
foldcauchy |
A folded Cauchy continuous random variable. |
foldnorm |
A folded normal continuous random variable. |
frechet_r |
A frechet_r continuous random variable. |
frechet_l |
A frechet_l continuous random variable. |
genlogistic |
A generalized logistic continuous random variable. |
gennorm |
A generalized normal continuous random variable. |
genpareto |
A generalized Pareto continuous random variable. |
genexpon |
A generalized exponential continuous random variable. |
genextreme |
A generalized extreme value continuous random variable. |
gausshyper |
A Gauss hypergeometric continuous random variable. |
gamma |
A gamma continuous random variable. |
gengamma |
A generalized gamma continuous random variable. |
genhalflogistic |
A generalized half-logistic continuous random variable. |
gilbrat |
A Gilbrat continuous random variable. |
gompertz |
A Gompertz (or truncated Gumbel) continuous random variable. |
gumbel_r |
A right-skewed Gumbel continuous random variable. |
gumbel_l |
A left-skewed Gumbel continuous random variable. |
halfcauchy |
A Half-Cauchy continuous random variable. |
halflogistic |
A half-logistic continuous random variable. |
halfnorm |
A half-normal continuous random variable. |
halfgennorm |
The upper half of a generalized normal continuous random variable. |
hypsecant |
A hyperbolic secant continuous random variable. |
invgamma |
An inverted gamma continuous random variable. |
invgauss |
An inverse Gaussian continuous random variable. |
invweibull |
An inverted Weibull continuous random variable. |
johnsonsb |
A Johnson SB continuous random variable. |
johnsonsu |
A Johnson SU continuous random variable. |
kappa4 |
Kappa 4 parameter distribution. |
kappa3 |
Kappa 3 parameter distribution. |
ksone |
General Kolmogorov-Smirnov one-sided test. |
kstwobign |
Kolmogorov-Smirnov two-sided test for large N. |
laplace |
A Laplace continuous random variable. |
levy |
A Levy continuous random variable. |
levy_l |
A left-skewed Levy continuous random variable. |
levy_stable |
A Levy-stable continuous random variable. |
logistic |
A logistic (or Sech-squared) continuous random variable. |
loggamma |
A log gamma continuous random variable. |
loglaplace |
A log-Laplace continuous random variable. |
lognorm |
A lognormal continuous random variable. |
lomax |
A Lomax (Pareto of the second kind) continuous random variable. |
maxwell |
A Maxwell continuous random variable. |
mielke |
A Mielke’s Beta-Kappa continuous random variable. |
moyal |
A Moyal continuous random variable. |
nakagami |
A Nakagami continuous random variable. |
ncx2 |
A non-central chi-squared continuous random variable. |
ncf |
A non-central F distribution continuous random variable. |
nct |
A non-central Student’s t continuous random variable. |
norm |
A normal continuous random variable. |
norminvgauss |
A Normal Inverse Gaussian continuous random variable. |
pareto |
A Pareto continuous random variable. |
pearson3 |
A pearson type III continuous random variable. |
powerlaw |
A power-function continuous random variable. |
powerlognorm |
A power log-normal continuous random variable. |
powernorm |
A power normal continuous random variable. |
rdist |
An R-distributed continuous random variable. |
reciprocal |
A reciprocal continuous random variable. |
rayleigh |
A Rayleigh continuous random variable. |
rice |
A Rice continuous random variable. |
recipinvgauss |
A reciprocal inverse Gaussian continuous random variable. |
semicircular |
A semicircular continuous random variable. |
skewnorm |
A skew-normal random variable. |
t |
A Student’s t continuous random variable. |
trapz |
A trapezoidal continuous random variable. |
triang |
A triangular continuous random variable. |
truncexpon |
A truncated exponential continuous random variable. |
truncnorm |
A truncated normal continuous random variable. |
tukeylambda |
A Tukey-Lamdba continuous random variable. |
uniform |
A uniform continuous random variable. |
vonmises |
A Von Mises continuous random variable. |
vonmises_line |
A Von Mises continuous random variable. |
wald |
A Wald continuous random variable. |
weibull_min |
Weibull minimum continuous random variable. |
weibull_max |
Weibull maximum continuous random variable. |
wrapcauchy |
A wrapped Cauchy continuous random variable. |
Multivariate distributions¶
multivariate_normal |
A multivariate normal random variable. |
matrix_normal |
A matrix normal random variable. |
dirichlet |
A Dirichlet random variable. |
wishart |
A Wishart random variable. |
invwishart |
An inverse Wishart random variable. |
multinomial |
A multinomial random variable. |
special_ortho_group |
A matrix-valued SO(N) random variable. |
ortho_group |
A matrix-valued O(N) random variable. |
unitary_group |
A matrix-valued U(N) random variable. |
random_correlation |
A random correlation matrix. |
Discrete distributions¶
bernoulli |
A Bernoulli discrete random variable. |
binom |
A binomial discrete random variable. |
boltzmann |
A Boltzmann (Truncated Discrete Exponential) random variable. |
dlaplace |
A Laplacian discrete random variable. |
geom |
A geometric discrete random variable. |
hypergeom |
A hypergeometric discrete random variable. |
logser |
A Logarithmic (Log-Series, Series) discrete random variable. |
nbinom |
A negative binomial discrete random variable. |
planck |
A Planck discrete exponential random variable. |
poisson |
A Poisson discrete random variable. |
randint |
A uniform discrete random variable. |
skellam |
A Skellam discrete random variable. |
zipf |
A Zipf discrete random variable. |
yulesimon |
A Yule-Simon discrete random variable. |
An overview of statistical functions is given below.
Several of these functions have a similar version in
scipy.stats.mstats
which work for masked arrays.
Summary statistics¶
describe (a[, axis, ddof, bias, nan_policy]) |
Compute several descriptive statistics of the passed array. |
gmean (a[, axis, dtype]) |
Compute the geometric mean along the specified axis. |
hmean (a[, axis, dtype]) |
Calculate the harmonic mean along the specified axis. |
kurtosis (a[, axis, fisher, bias, nan_policy]) |
Compute the kurtosis (Fisher or Pearson) of a dataset. |
mode (a[, axis, nan_policy]) |
Return an array of the modal (most common) value in the passed array. |
moment (a[, moment, axis, nan_policy]) |
Calculate the nth moment about the mean for a sample. |
skew (a[, axis, bias, nan_policy]) |
Compute the skewness of a data set. |
kstat (data[, n]) |
Return the nth k-statistic (1<=n<=4 so far). |
kstatvar (data[, n]) |
Returns an unbiased estimator of the variance of the k-statistic. |
tmean (a[, limits, inclusive, axis]) |
Compute the trimmed mean. |
tvar (a[, limits, inclusive, axis, ddof]) |
Compute the trimmed variance. |
tmin (a[, lowerlimit, axis, inclusive, …]) |
Compute the trimmed minimum. |
tmax (a[, upperlimit, axis, inclusive, …]) |
Compute the trimmed maximum. |
tstd (a[, limits, inclusive, axis, ddof]) |
Compute the trimmed sample standard deviation. |
tsem (a[, limits, inclusive, axis, ddof]) |
Compute the trimmed standard error of the mean. |
variation (a[, axis, nan_policy]) |
Compute the coefficient of variation, the ratio of the biased standard deviation to the mean. |
find_repeats (arr) |
Find repeats and repeat counts. |
trim_mean (a, proportiontocut[, axis]) |
Return mean of array after trimming distribution from both tails. |
iqr (x[, axis, rng, scale, nan_policy, …]) |
Compute the interquartile range of the data along the specified axis. |
sem (a[, axis, ddof, nan_policy]) |
Calculate the standard error of the mean (or standard error of measurement) of the values in the input array. |
bayes_mvs (data[, alpha]) |
Bayesian confidence intervals for the mean, var, and std. |
mvsdist (data) |
‘Frozen’ distributions for mean, variance, and standard deviation of data. |
entropy (pk[, qk, base]) |
Calculate the entropy of a distribution for given probability values. |
Frequency statistics¶
cumfreq (a[, numbins, defaultreallimits, weights]) |
Return a cumulative frequency histogram, using the histogram function. |
itemfreq (*args, **kwds) |
itemfreq is deprecated! itemfreq is deprecated and will be removed in a future version. |
percentileofscore (a, score[, kind]) |
The percentile rank of a score relative to a list of scores. |
scoreatpercentile (a, per[, limit, …]) |
Calculate the score at a given percentile of the input sequence. |
relfreq (a[, numbins, defaultreallimits, weights]) |
Return a relative frequency histogram, using the histogram function. |
binned_statistic (x, values[, statistic, …]) |
Compute a binned statistic for one or more sets of data. |
binned_statistic_2d (x, y, values[, …]) |
Compute a bidimensional binned statistic for one or more sets of data. |
binned_statistic_dd (sample, values[, …]) |
Compute a multidimensional binned statistic for a set of data. |
Correlation functions¶
f_oneway (*args) |
Performs a 1-way ANOVA. |
pearsonr (x, y) |
Calculate a Pearson correlation coefficient and the p-value for testing non-correlation. |
spearmanr (a[, b, axis, nan_policy]) |
Calculate a Spearman rank-order correlation coefficient and the p-value to test for non-correlation. |
pointbiserialr (x, y) |
Calculate a point biserial correlation coefficient and its p-value. |
kendalltau (x, y[, initial_lexsort, …]) |
Calculate Kendall’s tau, a correlation measure for ordinal data. |
weightedtau (x, y[, rank, weigher, additive]) |
Compute a weighted version of Kendall’s \(\tau\). |
linregress (x[, y]) |
Calculate a linear least-squares regression for two sets of measurements. |
siegelslopes (y[, x, method]) |
Computes the Siegel estimator for a set of points (x, y). |
theilslopes (y[, x, alpha]) |
Computes the Theil-Sen estimator for a set of points (x, y). |
Statistical tests¶
ttest_1samp (a, popmean[, axis, nan_policy]) |
Calculate the T-test for the mean of ONE group of scores. |
ttest_ind (a, b[, axis, equal_var, nan_policy]) |
Calculate the T-test for the means of two independent samples of scores. |
ttest_ind_from_stats (mean1, std1, nobs1, …) |
T-test for means of two independent samples from descriptive statistics. |
ttest_rel (a, b[, axis, nan_policy]) |
Calculate the T-test on TWO RELATED samples of scores, a and b. |
kstest (rvs, cdf[, args, N, alternative, mode]) |
Perform the Kolmogorov-Smirnov test for goodness of fit. |
chisquare (f_obs[, f_exp, ddof, axis]) |
Calculate a one-way chi square test. |
power_divergence (f_obs[, f_exp, ddof, axis, …]) |
Cressie-Read power divergence statistic and goodness of fit test. |
ks_2samp (data1, data2) |
Compute the Kolmogorov-Smirnov statistic on 2 samples. |
mannwhitneyu (x, y[, use_continuity, alternative]) |
Compute the Mann-Whitney rank test on samples x and y. |
tiecorrect (rankvals) |
Tie correction factor for ties in the Mann-Whitney U and Kruskal-Wallis H tests. |
rankdata (a[, method]) |
Assign ranks to data, dealing with ties appropriately. |
ranksums (x, y) |
Compute the Wilcoxon rank-sum statistic for two samples. |
wilcoxon (x[, y, zero_method, correction]) |
Calculate the Wilcoxon signed-rank test. |
kruskal (*args, **kwargs) |
Compute the Kruskal-Wallis H-test for independent samples |
friedmanchisquare (*args) |
Compute the Friedman test for repeated measurements |
brunnermunzel (x, y[, alternative, …]) |
Computes the Brunner-Munzel test on samples x and y |
combine_pvalues (pvalues[, method, weights]) |
Methods for combining the p-values of independent tests bearing upon the same hypothesis. |
jarque_bera (x) |
Perform the Jarque-Bera goodness of fit test on sample data. |
ansari (x, y) |
Perform the Ansari-Bradley test for equal scale parameters |
bartlett (*args) |
Perform Bartlett’s test for equal variances |
levene (*args, **kwds) |
Perform Levene test for equal variances. |
shapiro (x) |
Perform the Shapiro-Wilk test for normality. |
anderson (x[, dist]) |
Anderson-Darling test for data coming from a particular distribution |
anderson_ksamp (samples[, midrank]) |
The Anderson-Darling test for k-samples. |
binom_test (x[, n, p, alternative]) |
Perform a test that the probability of success is p. |
fligner (*args, **kwds) |
Perform Fligner-Killeen test for equality of variance. |
median_test (*args, **kwds) |
Mood’s median test. |
mood (x, y[, axis]) |
Perform Mood’s test for equal scale parameters. |
skewtest (a[, axis, nan_policy]) |
Test whether the skew is different from the normal distribution. |
kurtosistest (a[, axis, nan_policy]) |
Test whether a dataset has normal kurtosis. |
normaltest (a[, axis, nan_policy]) |
Test whether a sample differs from a normal distribution. |
Transformations¶
boxcox (x[, lmbda, alpha]) |
Return a positive dataset transformed by a Box-Cox power transformation. |
boxcox_normmax (x[, brack, method]) |
Compute optimal Box-Cox transform parameter for input data. |
boxcox_llf (lmb, data) |
The boxcox log-likelihood function. |
yeojohnson (x[, lmbda]) |
Return a dataset transformed by a Yeo-Johnson power transformation. |
yeojohnson_normmax (x[, brack]) |
Compute optimal Yeo-Johnson transform parameter for input data, using maximum likelihood estimation. |
yeojohnson_llf (lmb, data) |
The yeojohnson log-likelihood function. |
obrientransform (*args) |
Compute the O’Brien transform on input data (any number of arrays). |
sigmaclip (a[, low, high]) |
Iterative sigma-clipping of array elements. |
trimboth (a, proportiontocut[, axis]) |
Slices off a proportion of items from both ends of an array. |
trim1 (a, proportiontocut[, tail, axis]) |
Slices off a proportion from ONE end of the passed array distribution. |
zmap (scores, compare[, axis, ddof]) |
Calculate the relative z-scores. |
zscore (a[, axis, ddof]) |
Calculate the z score of each value in the sample, relative to the sample mean and standard deviation. |
Statistical distances¶
wasserstein_distance (u_values, v_values[, …]) |
Compute the first Wasserstein distance between two 1D distributions. |
energy_distance (u_values, v_values[, …]) |
Compute the energy distance between two 1D distributions. |
Random variate generation¶
rvs_ratio_uniforms (pdf, umax, vmin, vmax[, …]) |
Generate random samples from a probability density function using the ratio-of-uniforms method. |
Circular statistical functions¶
circmean (samples[, high, low, axis]) |
Compute the circular mean for samples in a range. |
circvar (samples[, high, low, axis]) |
Compute the circular variance for samples assumed to be in a range |
circstd (samples[, high, low, axis]) |
Compute the circular standard deviation for samples assumed to be in the range [low to high]. |
Contingency table functions¶
chi2_contingency (observed[, correction, lambda_]) |
Chi-square test of independence of variables in a contingency table. |
contingency.expected_freq (observed) |
Compute the expected frequencies from a contingency table. |
contingency.margins (a) |
Return a list of the marginal sums of the array a. |
fisher_exact (table[, alternative]) |
Performs a Fisher exact test on a 2x2 contingency table. |
Plot-tests¶
ppcc_max (x[, brack, dist]) |
Calculate the shape parameter that maximizes the PPCC |
ppcc_plot (x, a, b[, dist, plot, N]) |
Calculate and optionally plot probability plot correlation coefficient. |
probplot (x[, sparams, dist, fit, plot, rvalue]) |
Calculate quantiles for a probability plot, and optionally show the plot. |
boxcox_normplot (x, la, lb[, plot, N]) |
Compute parameters for a Box-Cox normality plot, optionally show it. |
yeojohnson_normplot (x, la, lb[, plot, N]) |
Compute parameters for a Yeo-Johnson normality plot, optionally show it. |
Masked statistics functions¶
- Statistical functions for masked arrays (
scipy.stats.mstats
)- Summary statistics
- scipy.stats.mstats.describe
- scipy.stats.mstats.gmean
- scipy.stats.mstats.hmean
- scipy.stats.mstats.kurtosis
- scipy.stats.mstats.mode
- scipy.stats.mstats.mquantiles
- scipy.stats.mstats.hdmedian
- scipy.stats.mstats.hdquantiles
- scipy.stats.mstats.hdquantiles_sd
- scipy.stats.mstats.idealfourths
- scipy.stats.mstats.plotting_positions
- scipy.stats.mstats.meppf
- scipy.stats.mstats.moment
- scipy.stats.mstats.skew
- scipy.stats.mstats.tmean
- scipy.stats.mstats.tvar
- scipy.stats.mstats.tmin
- scipy.stats.mstats.tmax
- scipy.stats.mstats.tsem
- scipy.stats.mstats.variation
- scipy.stats.mstats.find_repeats
- scipy.stats.mstats.sem
- scipy.stats.mstats.trimmed_mean
- scipy.stats.mstats.trimmed_mean_ci
- scipy.stats.mstats.trimmed_std
- scipy.stats.mstats.trimmed_var
- Frequency statistics
- Correlation functions
- scipy.stats.mstats.f_oneway
- scipy.stats.mstats.pearsonr
- scipy.stats.mstats.spearmanr
- scipy.stats.mstats.pointbiserialr
- scipy.stats.mstats.kendalltau
- scipy.stats.mstats.kendalltau_seasonal
- scipy.stats.mstats.linregress
- scipy.stats.mstats.siegelslopes
- scipy.stats.mstats.theilslopes
- scipy.stats.mstats.sen_seasonal_slopes
- Statistical tests
- scipy.stats.mstats.ttest_1samp
- scipy.stats.mstats.ttest_onesamp
- scipy.stats.mstats.ttest_ind
- scipy.stats.mstats.ttest_rel
- scipy.stats.mstats.chisquare
- scipy.stats.mstats.ks_2samp
- scipy.stats.mstats.ks_twosamp
- scipy.stats.mstats.mannwhitneyu
- scipy.stats.mstats.rankdata
- scipy.stats.mstats.kruskal
- scipy.stats.mstats.kruskalwallis
- scipy.stats.mstats.friedmanchisquare
- scipy.stats.mstats.brunnermunzel
- scipy.stats.mstats.skewtest
- scipy.stats.mstats.kurtosistest
- scipy.stats.mstats.normaltest
- Transformations
- Other
- Summary statistics
Univariate and multivariate kernel density estimation (scipy.stats.kde
)¶
gaussian_kde (dataset[, bw_method, weights]) |
Representation of a kernel-density estimate using Gaussian kernels. |
For many more stat related functions install the software R and the interface package rpy.