scipy.stats.boschloo_exact#
- scipy.stats.boschloo_exact(table, alternative='two-sided', n=32)[source]#
Perform Boschloo’s exact test on a 2x2 contingency table.
- Parameters
- tablearray_like of ints
A 2x2 contingency table. Elements should be non-negative integers.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes section below.
- nint, optional
Number of sampling points used in the construction of the sampling method. Note that this argument will automatically be converted to the next higher power of 2 since
scipy.stats.qmc.Sobol
is used to select sample points. Default is 32. Must be positive. In most cases, 32 points is enough to reach good precision. More points comes at performance cost.
- Returns
- berBoschlooExactResult
A result object with the following attributes.
- statisticfloat
The statistic used in Boschloo’s test; that is, the p-value from Fisher’s exact test.
- pvaluefloat
P-value, the probability of obtaining a distribution at least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
See also
chi2_contingency
Chi-square test of independence of variables in a contingency table.
fisher_exact
Fisher exact test on a 2x2 contingency table.
barnard_exact
Barnard’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
Notes
Boschloo’s test is an exact test used in the analysis of contingency tables. It examines the association of two categorical variables, and is a uniformly more powerful alternative to Fisher’s exact test for 2x2 contingency tables.
Boschloo’s exact test uses the p-value of Fisher’s exact test as a statistic, and Boschloo’s p-value is the probability under the null hypothesis of observing such an extreme value of this statistic.
Let’s define \(X_0\) a 2x2 matrix representing the observed sample, where each column stores the binomial experiment, as in the example below. Let’s also define \(p_1, p_2\) the theoretical binomial probabilities for \(x_{11}\) and \(x_{12}\). When using Boschloo exact test, we can assert three different alternative hypotheses:
\(H_0 : p_1=p_2\) versus \(H_1 : p_1 < p_2\), with alternative = “less”
\(H_0 : p_1=p_2\) versus \(H_1 : p_1 > p_2\), with alternative = “greater”
\(H_0 : p_1=p_2\) versus \(H_1 : p_1 \neq p_2\), with alternative = “two-sided” (default)
There are multiple conventions for computing a two-sided p-value when the null distribution is asymmetric. Here, we apply the convention that the p-value of a two-sided test is twice the minimum of the p-values of the one-sided tests (clipped to 1.0). Note that
fisher_exact
follows a different convention, so for a given table, the statistic reported byboschloo_exact
may differ from the p-value reported byfisher_exact
whenalternative='two-sided'
.New in version 1.7.0.
References
- 1
R.D. Boschloo. “Raised conditional level of significance for the 2 x 2-table when testing the equality of two probabilities”, Statistica Neerlandica, 24(1), 1970
- 2
“Boschloo’s test”, Wikipedia, https://en.wikipedia.org/wiki/Boschloo%27s_test
- 3
Lise M. Saari et al. “Employee attitudes and job satisfaction”, Human Resource Management, 43(4), 395-407, 2004, DOI:10.1002/hrm.20032.
Examples
In the following example, we consider the article “Employee attitudes and job satisfaction” [3] which reports the results of a survey from 63 scientists and 117 college professors. Of the 63 scientists, 31 said they were very satisfied with their jobs, whereas 74 of the college professors were very satisfied with their work. Is this significant evidence that college professors are happier with their work than scientists? The following table summarizes the data mentioned above:
college professors scientists Very Satisfied 74 31 Dissatisfied 43 32
When working with statistical hypothesis testing, we usually use a threshold probability or significance level upon which we decide to reject the null hypothesis \(H_0\). Suppose we choose the common significance level of 5%.
Our alternative hypothesis is that college professors are truly more satisfied with their work than scientists. Therefore, we expect \(p_1\) the proportion of very satisfied college professors to be greater than \(p_2\), the proportion of very satisfied scientists. We thus call
boschloo_exact
with thealternative="greater"
option:>>> import scipy.stats as stats >>> res = stats.boschloo_exact([[74, 31], [43, 32]], alternative="greater") >>> res.statistic 0.0483... >>> res.pvalue 0.0355...
Under the null hypothesis that scientists are happier in their work than college professors, the probability of obtaining test results at least as extreme as the observed data is approximately 3.55%. Since this p-value is less than our chosen significance level, we have evidence to reject \(H_0\) in favor of the alternative hypothesis.