Statistical functions for masked arrays (scipy.stats.mstats)#

This module contains a large number of statistical functions that can be used with masked arrays.

Most of these functions are similar to those in scipy.stats but might have small differences in the API or in the algorithm used. Since this is a relatively new package, some API changes are still possible.

Summary statistics#

describe(a[, axis, ddof, bias])

Computes several descriptive statistics of the passed array.

gmean(a[, axis, dtype, weights, nan_policy, ...])

Compute the weighted geometric mean along the specified axis.

hmean(a[, axis, dtype, weights, nan_policy, ...])

Calculate the weighted harmonic mean along the specified axis.

kurtosis(a[, axis, fisher, bias])

Computes the kurtosis (Fisher or Pearson) of a dataset.

mode(a[, axis])

Returns an array of the modal (most common) value in the passed array.

mquantiles(a[, prob, alphap, betap, axis, limit])

Computes empirical quantiles for a data array.

hdmedian(data[, axis, var])

Returns the Harrell-Davis estimate of the median along the given axis.

hdquantiles(data[, prob, axis, var])

Computes quantile estimates with the Harrell-Davis method.

hdquantiles_sd(data[, prob, axis])

The standard error of the Harrell-Davis quantile estimates by jackknife.

idealfourths(data[, axis])

Returns an estimate of the lower and upper quartiles.

plotting_positions(data[, alpha, beta])

Returns plotting positions (or empirical percentile points) for the data.

meppf(data[, alpha, beta])

Returns plotting positions (or empirical percentile points) for the data.

moment(a[, moment, axis])

Calculates the nth moment about the mean for a sample.

skew(a[, axis, bias])

Computes the skewness of a data set.

tmean(a[, limits, inclusive, axis])

Compute the trimmed mean.

tvar(a[, limits, inclusive, axis, ddof])

Compute the trimmed variance

tmin(a[, lowerlimit, axis, inclusive])

Compute the trimmed minimum

tmax(a[, upperlimit, axis, inclusive])

Compute the trimmed maximum

tsem(a[, limits, inclusive, axis, ddof])

Compute the trimmed standard error of the mean.

variation(a[, axis, ddof])

Compute the coefficient of variation.

find_repeats(arr)

Find repeats in arr and return a tuple (repeats, repeat_count).

sem(a[, axis, ddof])

Calculates the standard error of the mean of the input array.

trimmed_mean(a[, limits, inclusive, ...])

Returns the trimmed mean of the data along the given axis.

trimmed_mean_ci(data[, limits, inclusive, ...])

Selected confidence interval of the trimmed mean along the given axis.

trimmed_std(a[, limits, inclusive, ...])

Returns the trimmed standard deviation of the data along the given axis.

trimmed_var(a[, limits, inclusive, ...])

Returns the trimmed variance of the data along the given axis.

Frequency statistics#

scoreatpercentile(data, per[, limit, ...])

Calculate the score at the given 'per' percentile of the sequence a.

Correlation functions#

f_oneway(*args)

Performs a 1-way ANOVA, returning an F-value and probability given any number of groups.

pearsonr(x, y)

Pearson correlation coefficient and p-value for testing non-correlation.

spearmanr(x[, y, use_ties, axis, ...])

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

pointbiserialr(x, y)

Calculates a point biserial correlation coefficient and its p-value.

kendalltau(x, y[, use_ties, use_missing, ...])

Computes Kendall's rank correlation tau on two variables x and y.

kendalltau_seasonal(x)

Computes a multivariate Kendall's rank correlation tau, for seasonal data.

linregress(x[, y])

Linear regression calculation

siegelslopes(y[, x, method])

Computes the Siegel estimator for a set of points (x, y).

theilslopes(y[, x, alpha, method])

Computes the Theil-Sen estimator for a set of points (x, y).

sen_seasonal_slopes(x)

Statistical tests#

ttest_1samp(a, popmean[, axis, alternative])

Calculates the T-test for the mean of ONE group of scores.

ttest_onesamp(a, popmean[, axis, alternative])

Calculates the T-test for the mean of ONE group of scores.

ttest_ind(a, b[, axis, equal_var, alternative])

Calculates the T-test for the means of TWO INDEPENDENT samples of scores.

ttest_rel(a, b[, axis, alternative])

Calculates the T-test on TWO RELATED samples of scores, a and b.

chisquare(f_obs[, f_exp, ddof, axis])

Calculate a one-way chi-square test.

kstest(data1, data2[, args, alternative, method])

Parameters

ks_2samp(data1, data2[, alternative, method])

Computes the Kolmogorov-Smirnov test on two samples.

ks_1samp(x, cdf[, args, alternative, method])

Computes the Kolmogorov-Smirnov test on one sample of masked values.

ks_twosamp(data1, data2[, alternative, method])

Computes the Kolmogorov-Smirnov test on two samples.

mannwhitneyu(x, y[, use_continuity])

Computes the Mann-Whitney statistic

rankdata(data[, axis, use_missing])

Returns the rank (also known as order statistics) of each data point along the given axis.

kruskal(*args)

Compute the Kruskal-Wallis H-test for independent samples

kruskalwallis(*args)

Compute the Kruskal-Wallis H-test for independent samples

friedmanchisquare(*args)

Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA.

brunnermunzel(x, y[, alternative, distribution])

Computes the Brunner-Munzel test on samples x and y

skewtest(a[, axis, alternative])

Tests whether the skew is different from the normal distribution.

kurtosistest(a[, axis, alternative])

Tests whether a dataset has normal kurtosis

normaltest(a[, axis])

Tests whether a sample differs from a normal distribution.

Transformations#

obrientransform(*args)

Computes a transform on input data (any number of columns).

trim(a[, limits, inclusive, relative, axis])

Trims an array by masking the data outside some given limits.

trima(a[, limits, inclusive])

Trims an array by masking the data outside some given limits.

trimmed_stde(a[, limits, inclusive, axis])

Returns the standard error of the trimmed mean along the given axis.

trimr(a[, limits, inclusive, axis])

Trims an array by masking some proportion of the data on each end.

trimtail(data[, proportiontocut, tail, ...])

Trims the data by masking values from one tail.

trimboth(data[, proportiontocut, inclusive, ...])

Trims the smallest and largest data values.

winsorize(a[, limits, inclusive, inplace, ...])

Returns a Winsorized version of the input array.

zmap(scores, compare[, axis, ddof, nan_policy])

Calculate the relative z-scores.

zscore(a[, axis, ddof, nan_policy])

Compute the z score.

Other#

argstoarray(*args)

Constructs a 2D array from a group of sequences.

count_tied_groups(x[, use_missing])

Counts the number of tied values.

msign(x)

Returns the sign of x, or 0 if x is masked.

compare_medians_ms(group_1, group_2[, axis])

Compares the medians from two independent groups along the given axis.

median_cihs(data[, alpha, axis])

Computes the alpha-level confidence interval for the median of the data.

mjci(data[, prob, axis])

Returns the Maritz-Jarrett estimators of the standard error of selected experimental quantiles of the data.

mquantiles_cimj(data[, prob, alpha, axis])

Computes the alpha confidence interval for the selected quantiles of the data, with Maritz-Jarrett estimators.

rsh(data[, points])

Evaluates Rosenblatt's shifted histogram estimators for each data point.