scipy.stats.kendalltau#

scipy.stats.kendalltau(x, y, initial_lexsort=None, nan_policy='propagate', method='auto', variant='b', alternative='two-sided')[source]#

Calculate Kendall’s tau, a correlation measure for ordinal data.

Kendall’s tau is a measure of the correspondence between two rankings. Values close to 1 indicate strong agreement, and values close to -1 indicate strong disagreement. This implements two variants of Kendall’s tau: tau-b (the default) and tau-c (also known as Stuart’s tau-c). These differ only in how they are normalized to lie within the range -1 to 1; the hypothesis tests (their p-values) are identical. Kendall’s original tau-a is not implemented separately because both tau-b and tau-c reduce to tau-a in the absence of ties.

Parameters

x, yarray_like

Arrays of rankings, of the same shape. If arrays are not 1-D, they will be flattened to 1-D.

initial_lexsortbool, optional

Unused (deprecated).

nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan

‘raise’: throws an error

‘omit’: performs the calculations ignoring nan values

method{‘auto’, ‘asymptotic’, ‘exact’}, optional

Defines which method is used to calculate the p-value [5]. The following options are available (default is ‘auto’):

‘auto’: selects the appropriate method based on a trade-off between speed and accuracy

‘asymptotic’: uses a normal approximation valid for large samples

‘exact’: computes the exact p-value, but can only be used if no ties are present. As the sample size increases, the ‘exact’ computation time may grow and the result may lose some precision.

variant: {‘b’, ‘c’}, optional

Defines which variant of Kendall’s tau is returned. Default is ‘b’.

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:

‘two-sided’: the rank correlation is nonzero
‘less’: the rank correlation is negative (less than zero)
‘greater’: the rank correlation is positive (greater than zero)

Returns

correlationfloat: The tau statistic.
pvaluefloat: The p-value for a hypothesis test whose null hypothesis is an absence of association, tau = 0.

See also

spearmanr: Calculates a Spearman rank-order correlation coefficient.
theilslopes: Computes the Theil-Sen estimator for a set of points (x, y).
weightedtau: Computes a weighted version of Kendall’s tau.

Notes

The definition of Kendall’s tau that is used is [2]:

tau_b = (P - Q) / sqrt((P + Q + T) * (P + Q + U))

tau_c = 2 (P - Q) / (n**2 * (m - 1) / m)

where P is the number of concordant pairs, Q the number of discordant pairs, T the number of ties only in x, and U the number of ties only in y. If a tie occurs for the same pair in both x and y, it is not added to either T or U. n is the total number of samples, and m is the number of unique values in either x or y, whichever is smaller.

References

1: Maurice G. Kendall, “A New Measure of Rank Correlation”, Biometrika Vol. 30, No. 1/2, pp. 81-93, 1938.
2: Maurice G. Kendall, “The treatment of ties in ranking problems”, Biometrika Vol. 33, No. 3, pp. 239-251. 1945.
3: Gottfried E. Noether, “Elements of Nonparametric Statistics”, John Wiley & Sons, 1967.
4: Peter M. Fenwick, “A new data structure for cumulative frequency tables”, Software: Practice and Experience, Vol. 24, No. 3, pp. 327-336, 1994.
5: Maurice G. Kendall, “Rank Correlation Methods” (4th Edition), Charles Griffin & Co., 1970.

Examples

>>> from scipy import stats
>>> x1 = [12, 2, 1, 12, 2]
>>> x2 = [1, 4, 7, 1, 0]
>>> tau, p_value = stats.kendalltau(x1, x2)
>>> tau
-0.47140452079103173
>>> p_value
0.2827454599327748

scipy.stats.pointbiserialr

scipy.stats.weightedtau