scipy.stats.

poisson_means_test#

scipy.stats.poisson_means_test(k1, n1, k2, n2, *, diff=0, alternative='two-sided')[source]#

Performs the Poisson means test, AKA the “E-test”.

This is a test of the null hypothesis that the difference between means of two Poisson distributions is diff. The samples are provided as the number of events k1 and k2 observed within measurement intervals (e.g. of time, space, number of observations) of sizes n1 and n2.

Parameters:
k1int

Number of events observed from distribution 1.

n1: float

Size of sample from distribution 1.

k2int

Number of events observed from distribution 2.

n2float

Size of sample from distribution 2.

difffloat, default=0

The hypothesized difference in means between the distributions underlying the samples.

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):

  • ‘two-sided’: the difference between distribution means is not equal to diff

  • ‘less’: the difference between distribution means is less than diff

  • ‘greater’: the difference between distribution means is greater than diff

Returns:
statisticfloat

The test statistic (see [1] equation 3.3).

pvaluefloat

The probability of achieving such an extreme value of the test statistic under the null hypothesis.

Notes

Let:

\[X_1 \sim \mbox{Poisson}(\mathtt{n1}\lambda_1)\]

be a random variable independent of

\[X_2 \sim \mbox{Poisson}(\mathtt{n2}\lambda_2)\]

and let k1 and k2 be the observed values of \(X_1\) and \(X_2\), respectively. Then poisson_means_test uses the number of observed events k1 and k2 from samples of size n1 and n2, respectively, to test the null hypothesis that

\[H_0: \lambda_1 - \lambda_2 = \mathtt{diff}\]

A benefit of the E-test is that it has good power for small sample sizes, which can reduce sampling costs [1]. It has been evaluated and determined to be more powerful than the comparable C-test, sometimes referred to as the Poisson exact test.

References

[1] (1,2)

Krishnamoorthy, K., & Thomson, J. (2004). A more powerful test for comparing two Poisson means. Journal of Statistical Planning and Inference, 119(1), 23-35.

[2]

Przyborowski, J., & Wilenski, H. (1940). Homogeneity of results in testing samples from Poisson series: With an application to testing clover seed for dodder. Biometrika, 31(3/4), 313-323.

Examples

Suppose that a gardener wishes to test the number of dodder (weed) seeds in a sack of clover seeds that they buy from a seed company. It has previously been established that the number of dodder seeds in clover follows the Poisson distribution.

A 100 gram sample is drawn from the sack before being shipped to the gardener. The sample is analyzed, and it is found to contain no dodder seeds; that is, k1 is 0. However, upon arrival, the gardener draws another 100 gram sample from the sack. This time, three dodder seeds are found in the sample; that is, k2 is 3. The gardener would like to know if the difference is significant and not due to chance. The null hypothesis is that the difference between the two samples is merely due to chance, or that \(\lambda_1 - \lambda_2 = \mathtt{diff}\) where \(\mathtt{diff} = 0\). The alternative hypothesis is that the difference is not due to chance, or \(\lambda_1 - \lambda_2 \ne 0\). The gardener selects a significance level of 5% to reject the null hypothesis in favor of the alternative [2].

>>> import scipy.stats as stats
>>> res = stats.poisson_means_test(0, 100, 3, 100)
>>> res.statistic, res.pvalue
(-1.7320508075688772, 0.08837900929018157)

The p-value is .088, indicating a near 9% chance of observing a value of the test statistic under the null hypothesis. This exceeds 5%, so the gardener does not reject the null hypothesis as the difference cannot be regarded as significant at this level.