scipy.stats.wilcoxon¶
- scipy.stats.wilcoxon(x, y=None, zero_method='wilcox', correction=False, alternative='two-sided', mode='auto')[source]¶
Calculate the Wilcoxon signed-rank test.
The Wilcoxon signed-rank test tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. It is a non-parametric version of the paired T-test.
- Parameters
- xarray_like
Either the first set of measurements (in which case
y
is the second set of measurements), or the differences between two sets of measurements (in which casey
is not to be specified.) Must be one-dimensional.- yarray_like, optional
Either the second set of measurements (if
x
is the first set of measurements), or not specified (ifx
is the differences between two sets of measurements.) Must be one-dimensional.- zero_method{“pratt”, “wilcox”, “zsplit”}, optional
The following options are available (default is “wilcox”):
“pratt”: Includes zero-differences in the ranking process, but drops the ranks of the zeros, see [4], (more conservative).
“wilcox”: Discards all zero-differences, the default.
“zsplit”: Includes zero-differences in the ranking process and split the zero rank between positive and negative ones.
- correctionbool, optional
If True, apply continuity correction by adjusting the Wilcoxon rank statistic by 0.5 towards the mean value when computing the z-statistic if a normal approximation is used. Default is False.
- alternative{“two-sided”, “greater”, “less”}, optional
The alternative hypothesis to be tested, see Notes. Default is “two-sided”.
- mode{“auto”, “exact”, “approx”}
Method to calculate the p-value, see Notes. Default is “auto”.
- Returns
- statisticfloat
If
alternative
is “two-sided”, the sum of the ranks of the differences above or below zero, whichever is smaller. Otherwise the sum of the ranks of the differences above zero.- pvaluefloat
The p-value for the test depending on
alternative
andmode
.
See also
Notes
The test has been introduced in [4]. Given n independent samples (xi, yi) from a bivariate distribution (i.e. paired samples), it computes the differences di = xi - yi. One assumption of the test is that the differences are symmetric, see [2]. The two-sided test has the null hypothesis that the median of the differences is zero against the alternative that it is different from zero. The one-sided test has the null hypothesis that the median is positive against the alternative that it is negative (
alternative == 'less'
), or vice versa (alternative == 'greater.'
).To derive the p-value, the exact distribution (
mode == 'exact'
) can be used for sample sizes of up to 25. The defaultmode == 'auto'
uses the exact distribution if there are at most 25 observations and no ties, otherwise a normal approximation is used (mode == 'approx'
).The treatment of ties can be controlled by the parameter zero_method. If
zero_method == 'pratt'
, the normal approximation is adjusted as in [5]. A typical rule is to require that n > 20 ([2], p. 383).References
- 1
- 2(1,2)
Conover, W.J., Practical Nonparametric Statistics, 1971.
- 3
Pratt, J.W., Remarks on Zeros and Ties in the Wilcoxon Signed Rank Procedures, Journal of the American Statistical Association, Vol. 54, 1959, pp. 655-667. DOI:10.1080/01621459.1959.10501526
- 4(1,2,3)
Wilcoxon, F., Individual Comparisons by Ranking Methods, Biometrics Bulletin, Vol. 1, 1945, pp. 80-83. DOI:10.2307/3001968
- 5
Cureton, E.E., The Normal Approximation to the Signed-Rank Sampling Distribution When Zero Differences are Present, Journal of the American Statistical Association, Vol. 62, 1967, pp. 1068-1069. DOI:10.1080/01621459.1967.10500917
Examples
In [4], the differences in height between cross- and self-fertilized corn plants is given as follows:
>>> d = [6, 8, 14, 16, 23, 24, 28, 29, 41, -48, 49, 56, 60, -67, 75]
Cross-fertilized plants appear to be be higher. To test the null hypothesis that there is no height difference, we can apply the two-sided test:
>>> from scipy.stats import wilcoxon >>> w, p = wilcoxon(d) >>> w, p (24.0, 0.041259765625)
Hence, we would reject the null hypothesis at a confidence level of 5%, concluding that there is a difference in height between the groups. To confirm that the median of the differences can be assumed to be positive, we use:
>>> w, p = wilcoxon(d, alternative='greater') >>> w, p (96.0, 0.0206298828125)
This shows that the null hypothesis that the median is negative can be rejected at a confidence level of 5% in favor of the alternative that the median is greater than zero. The p-values above are exact. Using the normal approximation gives very similar values:
>>> w, p = wilcoxon(d, mode='approx') >>> w, p (24.0, 0.04088813291185591)
Note that the statistic changed to 96 in the one-sided case (the sum of ranks of positive differences) whereas it is 24 in the two-sided case (the minimum of sum of ranks above and below zero).