SciPy

scipy.stats.mstats.mquantiles

scipy.stats.mstats.mquantiles(a, prob=[0.25, 0.5, 0.75], alphap=0.4, betap=0.4, axis=None, limit=())[source]

Computes empirical quantiles for a data array.

Samples quantile are defined by Q(p) = (1-gamma)*x[j] + gamma*x[j+1], where x[j] is the j-th order statistic, and gamma is a function of j = floor(n*p + m), m = alphap + p*(1 - alphap - betap) and g = n*p + m - j.

Reinterpreting the above equations to compare to R lead to the equation: p(k) = (k - alphap)/(n + 1 - alphap - betap)

Typical values of (alphap,betap) are:
  • (0,1) : p(k) = k/n : linear interpolation of cdf (R type 4)
  • (.5,.5) : p(k) = (k - 1/2.)/n : piecewise linear function (R type 5)
  • (0,0) : p(k) = k/(n+1) : (R type 6)
  • (1,1) : p(k) = (k-1)/(n-1): p(k) = mode[F(x[k])]. (R type 7, R default)
  • (1/3,1/3): p(k) = (k-1/3)/(n+1/3): Then p(k) ~ median[F(x[k])]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x. (R type 8)
  • (3/8,3/8): p(k) = (k-3/8)/(n+1/4): Blom. The resulting quantile estimates are approximately unbiased if x is normally distributed (R type 9)
  • (.4,.4) : approximately quantile unbiased (Cunnane)
  • (.35,.35): APL, used with PWM
Parameters:

a : array_like

Input data, as a sequence or array of dimension at most 2.

prob : array_like, optional

List of quantiles to compute.

alphap : float, optional

Plotting positions parameter, default is 0.4.

betap : float, optional

Plotting positions parameter, default is 0.4.

axis : int, optional

Axis along which to perform the trimming. If None (default), the input array is first flattened.

limit : tuple

Tuple of (lower, upper) values. Values of a outside this open interval are ignored.

Returns:

mquantiles : MaskedArray

An array containing the calculated quantiles.

Notes

This formulation is very similar to R except the calculation of m from alphap and betap, where in R m is defined with each type.

References

[R245]R statistical software at http://www.r-project.org/

Examples

>>> from scipy.stats.mstats import mquantiles
>>> a = np.array([6., 47., 49., 15., 42., 41., 7., 39., 43., 40., 36.])
>>> mquantiles(a)
array([ 19.2,  40. ,  42.8])

Using a 2D array, specifying axis and limit.

>>> data = np.array([[   6.,    7.,    1.],
                     [  47.,   15.,    2.],
                     [  49.,   36.,    3.],
                     [  15.,   39.,    4.],
                     [  42.,   40., -999.],
                     [  41.,   41., -999.],
                     [   7., -999., -999.],
                     [  39., -999., -999.],
                     [  43., -999., -999.],
                     [  40., -999., -999.],
                     [  36., -999., -999.]])
>>> mquantiles(data, axis=0, limit=(0, 50))
array([[ 19.2 ,  14.6 ,   1.45],
       [ 40.  ,  37.5 ,   2.5 ],
       [ 42.8 ,  40.05,   3.55]])
>>> data[:, 2] = -999.
>>> mquantiles(data, axis=0, limit=(0, 50))
masked_array(data =
 [[19.2 14.6 --]
 [40.0 37.5 --]
 [42.8 40.05 --]],
             mask =
 [[False False  True]
  [False False  True]
  [False False  True]],
       fill_value = 1e+20)