scipy.stats.fisher_exact¶
- scipy.stats.fisher_exact(table, alternative='two-sided')[source]¶
Perform a Fisher exact test on a 2x2 contingency table.
- Parameters
- tablearray_like of ints
A 2x2 contingency table. Elements must be non-negative integers.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
‘two-sided’
‘less’: one-sided
‘greater’: one-sided
See the Notes for more details.
- Returns
- oddsratiofloat
This is prior odds ratio and not a posterior estimate.
- p_valuefloat
P-value, the probability of obtaining a distribution at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
See also
chi2_contingency
Chi-square test of independence of variables in a contingency table. This can be used as an alternative to
fisher_exact
when the numbers in the table are large.barnard_exact
Barnard’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
boschloo_exact
Boschloo’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
Notes
Null hypothesis and p-values
The null hypothesis is that the input table is from the hypergeometric distribution with parameters (as used in
hypergeom
)M = a + b + c + d
,n = a + b
andN = a + c
, where the input table is[[a, b], [c, d]]
. This distribution has supportmax(0, N + n - M) <= x <= min(N, n)
, or, in terms of the values in the input table,min(0, a - d) <= x <= a + min(b, c)
.x
can be interpreted as the upper-left element of a 2x2 table, so the tables in the distribution have form:[ x n - x ] [N - x M - (n + N) + x]
For example, if:
table = [6 2] [1 4]
then the support is
2 <= x <= 7
, and the tables in the distribution are:[2 6] [3 5] [4 4] [5 3] [6 2] [7 1] [5 0] [4 1] [3 2] [2 3] [1 4] [0 5]
The probability of each table is given by the hypergeometric distribution
hypergeom.pmf(x, M, n, N)
. For this example, these are (rounded to three significant digits):x 2 3 4 5 6 7 p 0.0163 0.163 0.408 0.326 0.0816 0.00466
These can be computed with:
>>> from scipy.stats import hypergeom >>> table = np.array([[6, 2], [1, 4]]) >>> M = table.sum() >>> n = table[0].sum() >>> N = table[:, 0].sum() >>> start, end = hypergeom.support(M, n, N) >>> hypergeom.pmf(np.arange(start, end+1), M, n, N) array([0.01631702, 0.16317016, 0.40792541, 0.32634033, 0.08158508, 0.004662 ])
The two-sided p-value is the probability that, under the null hypothesis, a random table would have a probability equal to or less than the probability of the input table. For our example, the probability of the input table (where
x = 6
) is 0.0816. The x values where the probability does not exceed this are 2, 6 and 7, so the two-sided p-value is0.0163 + 0.0816 + 0.00466 ~= 0.10256
:>>> from scipy.stats import fisher_exact >>> oddsr, p = fisher_exact(table, alternative='two-sided') >>> p 0.10256410256410257
The one-sided p-value for
alternative='greater'
is the probability that a random table hasx >= a
, which in our example isx >= 6
, or0.0816 + 0.00466 ~= 0.08626
:>>> oddsr, p = fisher_exact(table, alternative='greater') >>> p 0.08624708624708627
This is equivalent to computing the survival function of the distribution at
x = 5
(one less thanx
from the input table, because we want to include the probability ofx = 6
in the sum):>>> hypergeom.sf(5, M, n, N) 0.08624708624708627
For
alternative='less'
, the one-sided p-value is the probability that a random table hasx <= a
, (i.e.x <= 6
in our example), or0.0163 + 0.163 + 0.408 + 0.326 + 0.0816 ~= 0.9949
:>>> oddsr, p = fisher_exact(table, alternative='less') >>> p 0.9953379953379957
This is equivalent to computing the cumulative distribution function of the distribution at
x = 6
:>>> hypergeom.cdf(6, M, n, N) 0.9953379953379957
Odds ratio
The calculated odds ratio is different from the one R uses. This SciPy implementation returns the (more common) “unconditional Maximum Likelihood Estimate”, while R uses the “conditional Maximum Likelihood Estimate”.
Examples
Say we spend a few days counting whales and sharks in the Atlantic and Indian oceans. In the Atlantic ocean we find 8 whales and 1 shark, in the Indian ocean 2 whales and 5 sharks. Then our contingency table is:
Atlantic Indian whales 8 2 sharks 1 5
We use this table to find the p-value:
>>> from scipy.stats import fisher_exact >>> oddsratio, pvalue = fisher_exact([[8, 2], [1, 5]]) >>> pvalue 0.0349...
The probability that we would observe this or an even more imbalanced ratio by chance is about 3.5%. A commonly used significance level is 5%–if we adopt that, we can therefore conclude that our observed imbalance is statistically significant; whales prefer the Atlantic while sharks prefer the Indian ocean.