scipy.stats.

combine_pvalues#

scipy.stats.combine_pvalues(pvalues, method='fisher', weights=None, *, axis=0, nan_policy='propagate', keepdims=False)[source]#

Combine p-values from independent tests that bear upon the same hypothesis.

These methods are intended only for combining p-values from hypothesis tests based upon continuous distributions.

Each method assumes that under the null hypothesis, the p-values are sampled independently and uniformly from the interval [0, 1]. A test statistic (different for each method) is computed and a combined p-value is calculated based upon the distribution of this test statistic under the null hypothesis.

Parameters:

pvaluesarray_like

Array of p-values assumed to come from independent tests based on continuous distributions.

method{‘fisher’, ‘pearson’, ‘tippett’, ‘stouffer’, ‘mudholkar_george’}

Name of method to use to combine p-values.

The available methods are (see Notes for details):

‘fisher’: Fisher’s method (Fisher’s combined probability test)
‘pearson’: Pearson’s method
‘mudholkar_george’: Mudholkar’s and George’s method
‘tippett’: Tippett’s method
‘stouffer’: Stouffer’s Z-score method

weightsarray_like, optional

Optional array of weights used only for Stouffer’s Z-score method. Ignored by other methods.

axisint or None, default: 0

If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None, the input will be raveled before computing the statistic.

nan_policy{‘propagate’, ‘omit’, ‘raise’}

Defines how to handle input NaNs.

propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.
omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.
raise: if a NaN is present, a ValueError will be raised.

keepdimsbool, default: False

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Returns:

resSignificanceResult

An object containing attributes:

statisticfloat: The statistic calculated by the specified method.
pvaluefloat: The combined p-value.

Notes

If this function is applied to tests with a discrete statistics such as any rank test or contingency-table test, it will yield systematically wrong results, e.g. Fisher’s method will systematically overestimate the p-value [1]. This problem becomes less severe for large sample sizes when the discrete distributions become approximately continuous.

The differences between the methods can be best illustrated by their statistics and what aspects of a combination of p-values they emphasise when considering significance [2]. For example, methods emphasising large p-values are more sensitive to strong false and true negatives; conversely methods focussing on small p-values are sensitive to positives.

The statistics of Fisher’s method (also known as Fisher’s combined probability test) [3] is \(-2\sum_i \log(p_i)\), which is equivalent (as a test statistics) to the product of individual p-values: \(\prod_i p_i\). Under the null hypothesis, this statistics follows a \(\chi^2\) distribution. This method emphasises small p-values.
Pearson’s method uses \(-2\sum_i\log(1-p_i)\), which is equivalent to \(\prod_i \frac{1}{1-p_i}\) [2]. It thus emphasises large p-values.
Mudholkar and George compromise between Fisher’s and Pearson’s method by averaging their statistics [4]. Their method emphasises extreme p-values, both close to 1 and 0.
Stouffer’s method [5] uses Z-scores and the statistic: \(\sum_i \Phi^{-1} (p_i)\), where \(\Phi\) is the CDF of the standard normal distribution. The advantage of this method is that it is straightforward to introduce weights, which can make Stouffer’s method more powerful than Fisher’s method when the p-values are from studies of different size [6] [7].
Tippett’s method uses the smallest p-value as a statistic. (Mind that this minimum is not the combined p-value.)

Fisher’s method may be extended to combine p-values from dependent tests [8]. Extensions such as Brown’s method and Kost’s method are not currently implemented.

Added in version 0.15.0.

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

References

[1]

Kincaid, W. M., “The Combination of Tests Based on Discrete Distributions.” Journal of the American Statistical Association 57, no. 297 (1962), 10-19.

[2] (1,2)

Heard, N. and Rubin-Delanchey, P. “Choosing between methods of combining p-values.” Biometrika 105.1 (2018): 239-246.

[3]

https://en.wikipedia.org/wiki/Fisher%27s_method

[4]

George, E. O., and G. S. Mudholkar. “On the convolution of logistic random variables.” Metrika 30.1 (1983): 1-13.

[5]

https://en.wikipedia.org/wiki/Fisher%27s_method#Relation_to_Stouffer.27s_Z-score_method

[6]

Whitlock, M. C. “Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach.” Journal of Evolutionary Biology 18, no. 5 (2005): 1368-1373.

[7]

Zaykin, Dmitri V. “Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis.” Journal of Evolutionary Biology 24, no. 8 (2011): 1836-1841.

[8]

https://en.wikipedia.org/wiki/Extensions_of_Fisher%27s_method

Examples

Suppose we wish to combine p-values from four independent tests of the same null hypothesis using Fisher’s method (default).

>>> from scipy.stats import combine_pvalues
>>> pvalues = [0.1, 0.05, 0.02, 0.3]
>>> combine_pvalues(pvalues)
SignificanceResult(statistic=20.828626352604235, pvalue=0.007616871850449092)

When the individual p-values carry different weights, consider Stouffer’s method.

>>> weights = [1, 2, 3, 4]
>>> res = combine_pvalues(pvalues, method='stouffer', weights=weights)
>>> res.pvalue
0.009578891494533616