scipy.stats.fisher_exact¶

scipy.stats.
fisher_exact
(table, alternative='twosided')[source]¶ Perform a Fisher exact test on a 2x2 contingency table.
 Parameters
 tablearray_like of ints
A 2x2 contingency table. Elements must be nonnegative integers.
 alternative{‘twosided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis. The following options are available (default is ‘twosided’):
‘twosided’
‘less’: onesided
‘greater’: onesided
See the Notes for more details.
 Returns
 oddsratiofloat
This is prior odds ratio and not a posterior estimate.
 p_valuefloat
Pvalue, the probability of obtaining a distribution at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
See also
chi2_contingency
Chisquare test of independence of variables in a contingency table. This can be used as an alternative to
fisher_exact
when the numbers in the table are large.barnard_exact
Barnard’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
boschloo_exact
Boschloo’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
Notes
Null hypothesis and pvalues
The null hypothesis is that the input table is from the hypergeometric distribution with parameters (as used in
hypergeom
)M = a + b + c + d
,n = a + b
andN = a + c
, where the input table is[[a, b], [c, d]]
. This distribution has supportmax(0, N + n  M) <= x <= min(N, n)
, or, in terms of the values in the input table,min(0, a  d) <= x <= a + min(b, c)
.x
can be interpreted as the upperleft element of a 2x2 table, so the tables in the distribution have form:[ x n  x ] [N  x M  (n + N) + x]
For example, if:
table = [6 2] [1 4]
then the support is
2 <= x <= 7
, and the tables in the distribution are:[2 6] [3 5] [4 4] [5 3] [6 2] [7 1] [5 0] [4 1] [3 2] [2 3] [1 4] [0 5]
The probability of each table is given by the hypergeometric distribution
hypergeom.pmf(x, M, n, N)
. For this example, these are (rounded to three significant digits):x 2 3 4 5 6 7 p 0.0163 0.163 0.408 0.326 0.0816 0.00466
These can be computed with:
>>> from scipy.stats import hypergeom >>> table = np.array([[6, 2], [1, 4]]) >>> M = table.sum() >>> n = table[0].sum() >>> N = table[:, 0].sum() >>> start, end = hypergeom.support(M, n, N) >>> hypergeom.pmf(np.arange(start, end+1), M, n, N) array([0.01631702, 0.16317016, 0.40792541, 0.32634033, 0.08158508, 0.004662 ])
The twosided pvalue is the probability that, under the null hypothesis, a random table would have a probability equal to or less than the probability of the input table. For our example, the probability of the input table (where
x = 6
) is 0.0816. The x values where the probability does not exceed this are 2, 6 and 7, so the twosided pvalue is0.0163 + 0.0816 + 0.00466 ~= 0.10256
:>>> from scipy.stats import fisher_exact >>> oddsr, p = fisher_exact(table, alternative='twosided') >>> p 0.10256410256410257
The onesided pvalue for
alternative='greater'
is the probability that a random table hasx >= a
, which in our example isx >= 6
, or0.0816 + 0.00466 ~= 0.08626
:>>> oddsr, p = fisher_exact(table, alternative='greater') >>> p 0.08624708624708627
This is equivalent to computing the survival function of the distribution at
x = 5
(one less thanx
from the input table, because we want to include the probability ofx = 6
in the sum):>>> hypergeom.sf(5, M, n, N) 0.08624708624708627
For
alternative='less'
, the onesided pvalue is the probability that a random table hasx <= a
, (i.e.x <= 6
in our example), or0.0163 + 0.163 + 0.408 + 0.326 + 0.0816 ~= 0.9949
:>>> oddsr, p = fisher_exact(table, alternative='less') >>> p 0.9953379953379957
This is equivalent to computing the cumulative distribution function of the distribution at
x = 6
:>>> hypergeom.cdf(6, M, n, N) 0.9953379953379957
Odds ratio
The calculated odds ratio is different from the one R uses. This SciPy implementation returns the (more common) “unconditional Maximum Likelihood Estimate”, while R uses the “conditional Maximum Likelihood Estimate”.
Examples
Say we spend a few days counting whales and sharks in the Atlantic and Indian oceans. In the Atlantic ocean we find 8 whales and 1 shark, in the Indian ocean 2 whales and 5 sharks. Then our contingency table is:
Atlantic Indian whales 8 2 sharks 1 5
We use this table to find the pvalue:
>>> from scipy.stats import fisher_exact >>> oddsratio, pvalue = fisher_exact([[8, 2], [1, 5]]) >>> pvalue 0.0349...
The probability that we would observe this or an even more imbalanced ratio by chance is about 3.5%. A commonly used significance level is 5%–if we adopt that, we can therefore conclude that our observed imbalance is statistically significant; whales prefer the Atlantic while sharks prefer the Indian ocean.