scipy.stats.spearmanr#

scipy.stats.spearmanr(a, b=None, axis=0, nan_policy='propagate', alternative='two-sided')[source]#

Calculate a Spearman correlation coefficient with associated p-value.

The Spearman rank-order correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. Although calculation of the p-value does not make strong assumptions about the distributions underlying the samples, it is only accurate for very large samples (>500 observations). For smaller sample sizes, consider a permutation test (see Examples section below).

Parameters:

a, b1D or 2D array_like, b is optional

One or two 1-D or 2-D arrays containing multiple variables and observations. When these are 1-D, each represents a vector of observations of a single variable. For the behavior in the 2-D case, see under axis, below. Both arrays need to have the same length in the axis dimension.

axisint or None, optional

If axis=0 (default), then each column represents a variable, with observations in the rows. If axis=1, the relationship is transposed: each row represents a variable, while the columns contain observations. If axis=None, then both arrays will be raveled.

nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan
‘raise’: throws an error
‘omit’: performs the calculations ignoring nan values

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:

‘two-sided’: the correlation is nonzero
‘less’: the correlation is negative (less than zero)
‘greater’: the correlation is positive (greater than zero)

New in version 1.7.0.

Returns:

resSignificanceResult

An object containing attributes:

statisticfloat or ndarray (2-D square): Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters). Correlation matrix is square with length equal to total number of variables (columns or rows) in a and b combined.
pvaluefloat: The p-value for a hypothesis test whose null hypothesis is that two sets of data are linearly uncorrelated. See alternative above for alternative hypotheses. pvalue has the same shape as statistic.

Warns:

ConstantInputWarning: Raised if an input is a constant array. The correlation coefficient is not defined in this case, so np.nan is returned.

References

[1]

Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 14.7

[2]

Kendall, M. G. and Stuart, A. (1973). The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Griffin. 1973. Section 31.18

Examples

>>> import numpy as np
>>> from scipy import stats
>>> res = stats.spearmanr([1, 2, 3, 4, 5], [5, 6, 7, 8, 7])
>>> res.statistic
0.8207826816681233
>>> res.pvalue
0.08858700531354381
>>> rng = np.random.default_rng()
>>> x2n = rng.standard_normal((100, 2))
>>> y2n = rng.standard_normal((100, 2))
>>> res = stats.spearmanr(x2n)
>>> res.statistic, res.pvalue
(-0.07960396039603959, 0.4311168705769747)
>>> res = stats.spearmanr(x2n[:, 0], x2n[:, 1])
>>> res.statistic, res.pvalue
(-0.07960396039603959, 0.4311168705769747)
>>> res = stats.spearmanr(x2n, y2n)
>>> res.statistic
array([[ 1.        , -0.07960396, -0.08314431,  0.09662166],
       [-0.07960396,  1.        , -0.14448245,  0.16738074],
       [-0.08314431, -0.14448245,  1.        ,  0.03234323],
       [ 0.09662166,  0.16738074,  0.03234323,  1.        ]])
>>> res.pvalue
array([[0.        , 0.43111687, 0.41084066, 0.33891628],
       [0.43111687, 0.        , 0.15151618, 0.09600687],
       [0.41084066, 0.15151618, 0.        , 0.74938561],
       [0.33891628, 0.09600687, 0.74938561, 0.        ]])
>>> res = stats.spearmanr(x2n.T, y2n.T, axis=1)
>>> res.statistic
array([[ 1.        , -0.07960396, -0.08314431,  0.09662166],
       [-0.07960396,  1.        , -0.14448245,  0.16738074],
       [-0.08314431, -0.14448245,  1.        ,  0.03234323],
       [ 0.09662166,  0.16738074,  0.03234323,  1.        ]])
>>> res = stats.spearmanr(x2n, y2n, axis=None)
>>> res.statistic, res.pvalue
(0.044981624540613524, 0.5270803651336189)
>>> res = stats.spearmanr(x2n.ravel(), y2n.ravel())
>>> res.statistic, res.pvalue
(0.044981624540613524, 0.5270803651336189)

>>> rng = np.random.default_rng()
>>> xint = rng.integers(10, size=(100, 2))
>>> res = stats.spearmanr(xint)
>>> res.statistic, res.pvalue
(0.09800224850707953, 0.3320271757932076)

For small samples, consider performing a permutation test instead of relying on the asymptotic p-value. Note that to calculate the null distribution of the statistic (for all possibly pairings between observations in sample x and y), only one of the two inputs needs to be permuted.

>>> x = [1.76405235, 0.40015721, 0.97873798,
...      2.2408932, 1.86755799, -0.97727788]
>>> y = [2.71414076, 0.2488, 0.87551913,
...      2.6514917, 2.01160156, 0.47699563]
>>> def statistic(x):  # permute only `x`
...     return stats.spearmanr(x, y).statistic
>>> res_exact = stats.permutation_test((x,), statistic,
...                                    permutation_type='pairings')
>>> res_asymptotic = stats.spearmanr(x, y)
>>> res_exact.pvalue, res_asymptotic.pvalue  # asymptotic pvalue is too low
(0.10277777777777777, 0.07239650145772594)