Statistical functions for masked arrays (scipy.stats.mstats)ΒΆ

This module contains a large number of statistical functions that can be used with masked arrays.

Most of these functions are similar to those in scipy.stats but might have small differences in the API or in the algorithm used. Since this is a relatively new package, some API changes are still possible.

argstoarray (*args) Constructs a 2D array from a sequence of sequences. Sequences are filled with missing values to match the length of the longest sequence.
betai (a, b, x) Returns the incomplete beta function.
chisquare (f_obs[, f_exp]) Calculates a one-way chi square for array of observed frequencies and returns the result. If no expected frequencies are given, the total N is assumed to be equally distributed across all groups.
count_tied_groups (x[, use_missing]) Counts the number of tied values in x, and returns a dictionary (nb of ties: nb of groups).
describe (a[, axis]) Computes several descriptive statistics of the passed array.
f_oneway (*args) Performs a 1-way ANOVA, returning an F-value and probability given any number of groups. From Heiman, pp.394-7.
f_value_wilks_lambda (ER, EF, dfnum, dfden, ...) Calculation of Wilks lambda F-statistic for multivarite data, per Maxwell & Delaney p.657.
find_repeats (arr) Find repeats in arr and return a tuple (repeats, repeat_count). Masked values are discarded.
friedmanchisquare (*args) Friedman Chi-Square is a non-parametric, one-way within-subjects ANOVA. This function calculates the Friedman Chi-square test for repeated measures and returns the result, along with the associated probability value.
gmean (a[, axis]) Calculates the geometric mean of the values in the passed array.
hmean (a[, axis]) Calculates the harmonic mean of the values in the passed array.
kendalltau (x, y[, use_ties, use_missing]) Computes Kendall’s rank correlation tau on two variables x and y.
kendalltau_seasonal (x) Computes a multivariate extension Kendall’s rank correlation tau, designed for seasonal data.
kruskalwallis (*args) The Kruskal-Wallis H-test is a non-parametric ANOVA for 2 or more groups, requiring at least 5 subjects in each group. This function calculates the Kruskal-Wallis H and associated p-value for 2 or more independent samples.
kruskalwallis (*args) The Kruskal-Wallis H-test is a non-parametric ANOVA for 2 or more groups, requiring at least 5 subjects in each group. This function calculates the Kruskal-Wallis H and associated p-value for 2 or more independent samples.
ks_twosamp (data1, data2[, alternative]) Computes the Kolmogorov-Smirnov test on two samples. Missing values are discarded.
ks_twosamp (data1, data2[, alternative]) Computes the Kolmogorov-Smirnov test on two samples. Missing values are discarded.
kurtosis (a[, axis, fisher, bias]) Computes the kurtosis (Fisher or Pearson) of a dataset.
kurtosistest (a[, axis]) Tests whether a dataset has normal kurtosis (i.e., kurtosis=3(n-1)/(n+1)).
linregress (*args) Calculates a regression line on two arrays, x and y, corresponding to x,y pairs. If a single 2D array is passed, linregress finds dim with 2 levels and splits data into x,y pairs along that dim.
mannwhitneyu (x, y[, use_continuity]) Computes the Mann-Whitney on samples x and y. Missing values in x and/or y are discarded.
plotting_positions (data[, alpha, beta]) Returns the plotting positions (or empirical percentile points) for the data. Plotting positions are defined as (i-alpha)/(n-alpha-beta), where: - i is the rank order statistics - n is the number of unmasked values along the given axis - alpha and beta are two parameters.
mode (a[, axis]) Returns an array of the modal (most common) value in the passed array.
moment (a[, moment, axis]) Calculates the nth moment about the mean for a sample.
mquantiles (data[, prob, 0.5, 0.75], alphap, betap, axis, limit=()) Computes empirical quantiles for a 1xN data array. Samples quantile are defined by: Q(p) = (1-g).x[i] +g.x[i+1] where x[j] is the jth order statistic, with i = (floor(n*p+m)), m=alpha+p*(1-alpha-beta) and g = n*p + m - i).
msign (x) Returns the sign of x, or 0 if x is masked.
normaltest (a[, axis]) Tests whether skew and/or kurtosis of dataset differs from normal curve.
obrientransform (*args) Computes a transform on input data (any number of columns). Used to test for homogeneity of variance prior to running one-way stats. Each array in *args is one level of a factor. If an F_oneway() run on the transformed data and found significant, variances are unequal. From Maxwell and Delaney, p.112.
pearsonr (x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation.
plotting_positions (data[, alpha, beta]) Returns the plotting positions (or empirical percentile points) for the data. Plotting positions are defined as (i-alpha)/(n-alpha-beta), where: - i is the rank order statistics - n is the number of unmasked values along the given axis - alpha and beta are two parameters.
pointbiserialr (x, y) Calculates a point biserial correlation coefficient and the associated p-value.
rankdata (data[, axis, use_missing]) Returns the rank (also known as order statistics) of each data point along the given axis.
samplestd (data[, axis]) Returns a biased estimate of the standard deviation of the data, as the square root of the average squared deviations from the mean.
samplevar (data[, axis]) Returns a biased estimate of the variance of the data, as the average of the squared deviations from the mean.
scoreatpercentile (data, per[, limit=(), alphap, ...]) Calculate the score at the given ‘per’ percentile of the sequence a. For example, the score at per=50 is the median.
sem (a[, axis]) Returns the standard error of the mean (i.e., using N) of the values in the passed array. Axis can equal None (ravel array first), or an integer (the axis over which to operate)
signaltonoise (data[, axis]) Calculates the signal-to-noise ratio, as the ratio of the mean over standard deviation along the given axis.
skew (a[, axis, bias]) Computes the skewness of a data set.
skewtest (a[, axis]) Tests whether the skew is significantly different from a normal distribution.
spearmanr (x, y[, use_ties]) Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.
std (a[, axis]) Returns the estimated population standard deviation of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
stderr (a[, axis]) Returns the estimated population standard error of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
theilslopes (y[, x, alpha]) Computes the Theil slope over the dataset (x,y), as the median of all slopes between paired values.
threshold (a[, threshmin, threshmax, ...]) Clip array to a given value.
tmax (a, upperlimit[, axis, inclusive]) Returns the maximum value of a, along axis, including only values greater than (or equal to, if inclusive is True) upperlimit. If the limit is set to None, a limit larger than the max value in the array is used.
tmean (a[, limits, inclusive, True)) Returns the arithmetic mean of all values in an array, ignoring values strictly outside given limits.
tmin (a[, lowerlimit, axis, ...]) Returns the minimum value of a, along axis, including only values less than (or equal to, if inclusive is True) lowerlimit. If the limit is set to None, all values in the array are used.
trim (a[, limits, inclusive, ...]) Trims an array by masking the data outside some given limits. Returns a masked version of the input array.
trima (a[, limits, inclusive, True)) Trims an array by masking the data outside some given limits. Returns a masked version of the input array.
trimboth (data[, proportiontocut, ...]) Trims the data by masking the int(proportiontocut*n) smallest and int(proportiontocut*n) largest values of data along the given axis, where n is the number of unmasked values before trimming.
trimmed_stde (a[, limits, 0.10000000000000001), ...]) Returns the standard error of the trimmed mean of the data along the given axis. Parameters ———- a : sequence Input array limits : {(0.1,0.1), tuple of float} optional tuple (lower percentage, upper percentage) to cut on each side of the array, with respect to the number of unmasked data. Noting n the number of unmasked data before trimming, the (n*limits[0])th smallest data and the (n*limits[1])th largest data are masked, and the total number of unmasked data after trimming is n*(1.-sum(limits)) In each case, the value of one limit can be set to None to indicate an open interval. If limits is None, no trimming is performed inclusive : {(True, True) tuple} optional Tuple indicating whether the number of data being masked on each side should be rounded (True) or truncated (False). axis : {None, integer}, optional Axis along which to trim.
trimr (a[, limits, inclusive, ...]) Trims an array by masking some proportion of the data on each end. Returns a masked version of the input array.
trimtail (data[, proportiontocut, ...]) Trims the data by masking int(trim*n) values from ONE tail of the data along the given axis, where n is the number of unmasked values.
tsem (a[, limits, inclusive, True)) Returns the standard error of the mean for the values in an array, (i.e., using N for the denominator), ignoring values strictly outside the sequence passed to ‘limits’. Note: either limit in the sequence, or the value of limits itself, can be set to None. The inclusive list/tuple determines whether the lower and upper limiting bounds (respectively) are open/exclusive (0) or closed/inclusive (1).
ttest_onesamp (a, popmean) Calculates the T-test for the mean of ONE group of scores a.
ttest_ind (a, b[, axis]) Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
ttest_onesamp (a, popmean) Calculates the T-test for the mean of ONE group of scores a.
ttest_rel (a, b[, axis]) Calculates the T-test on TWO RELATED samples of scores, a and b.
tvar (a[, limits, inclusive, True)) Returns the sample variance of values in an array, (i.e., using N-1), ignoring values strictly outside the sequence passed to ‘limits’. Note: either limit in the sequence, or the value of limits itself, can be set to None. The inclusive list/tuple determines whether the lower and upper limiting bounds (respectively) are open/exclusive (0) or closed/inclusive (1).
var (a[, axis]) Returns the estimated population variance of the values in the passed array (i.e., N-1). Axis can equal None (ravel array first), or an integer (the axis over which to operate).
variation (a[, axis]) Computes the coefficient of variation, the ratio of the biased standard deviation to the mean.
winsorize (a[, limits, inclusive, ...]) Returns a Winsorized version of the input array.
z (a, score) Returns the z-score of a given input score, given thearray from which that score came. Not appropriate for population calculations, nor for arrays > 1D.
zmap (scores, compare[, axis]) Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 of the compare array.
zs (a) Returns a 1D array of z-scores, one for each score in the passed array, computed relative to the passed array.