scipy.stats.spearmanr

scipy.stats.spearmanr(a, b=None, axis=0)

Calculates a Spearman rank-order correlation coefficient and the p-value to test for non-correlation.

The Spearman correlation is a nonparametric measure of the linear relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. The p-values are not entirely reliable but are probably reasonable for datasets larger than 500 or so.

spearmanr currently does not do any tie correction, and is only correct if there are no ties in the data.

Parameters :

a, b : 1D or 2D array_like, b is optional

One or two 1-D or 2-D arrays containing multiple variables and observations. Each column of m represents a variable, and each row entry a single observation of those variables. Also see axis below. Both arrays need to have the same length in the axis dimension.

axis : int or None, optional

If axis=0 (default), then each column represents a variable, with observations in the rows. If axis=0, the relationship is transposed: each row represents a variable, while the columns contain observations. If axis=None, then both arrays will be raveled

Returns :

rho: float or array (2D square) :

Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters. Correlation matrix is square with length equal to total number of variables (columns or rows) in a and b combined

p-value : float

The two-sided p-value for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated, has same dimension as rho

Notes

changes in scipy 0.8: rewrite to add tie-handling, and axis

References

[CRCProbStat2000] Section 14.7

[CRCProbStat2000](1, 2) Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probablity and Statistics Tables and Formulae. Chapman & Hall: New York. 2000.

Examples

>>> spearmanr([1,2,3,4,5],[5,6,7,8,7])
(0.82078268166812329, 0.088587005313543798)
>>> np.random.seed(1234321)
>>> x2n=np.random.randn(100,2)
>>> y2n=np.random.randn(100,2)
>>> spearmanr(x2n)
(0.059969996999699973, 0.55338590803773591)
>>> spearmanr(x2n[:,0], x2n[:,1])
(0.059969996999699973, 0.55338590803773591)
>>> rho, pval = spearmanr(x2n,y2n)
>>> rho
array([[ 1.        ,  0.05997   ,  0.18569457,  0.06258626],
       [ 0.05997   ,  1.        ,  0.110003  ,  0.02534653],
       [ 0.18569457,  0.110003  ,  1.        ,  0.03488749],
       [ 0.06258626,  0.02534653,  0.03488749,  1.        ]])
>>> pval
array([[ 0.        ,  0.55338591,  0.06435364,  0.53617935],
       [ 0.55338591,  0.        ,  0.27592895,  0.80234077],
       [ 0.06435364,  0.27592895,  0.        ,  0.73039992],
       [ 0.53617935,  0.80234077,  0.73039992,  0.        ]])
>>> rho, pval = spearmanr(x2n.T, y2n.T, axis=1)
>>> rho
array([[ 1.        ,  0.05997   ,  0.18569457,  0.06258626],
       [ 0.05997   ,  1.        ,  0.110003  ,  0.02534653],
       [ 0.18569457,  0.110003  ,  1.        ,  0.03488749],
       [ 0.06258626,  0.02534653,  0.03488749,  1.        ]])
>>> spearmanr(x2n, y2n, axis=None)
(0.10816770419260482, 0.1273562188027364)
>>> spearmanr(x2n.ravel(), y2n.ravel())
(0.10816770419260482, 0.1273562188027364)
>>> xint = np.random.randint(10,size=(100,2))
>>> spearmanr(xint)
(0.052760927029710199, 0.60213045837062351)

Previous topic

scipy.stats.pearsonr

Next topic

scipy.stats.pointbiserialr

This Page