scipy.stats.multivariate_hypergeom#

scipy.stats.multivariate_hypergeom = <scipy.stats._multivariate.multivariate_hypergeom_gen object>[source]#

A multivariate hypergeometric random variable.

Parameters:

marray_like: The number of each type of object in the population. That is, \(m[i]\) is the number of objects of type \(i\).
narray_like: The number of samples taken from the population.
seed{None, int, np.random.RandomState, np.random.Generator}, optional: Used for drawing random variates. If seed is None, the RandomState singleton is used. If seed is an int, a new RandomState instance is used, seeded with seed. If seed is already a RandomState or Generator instance, then that object is used. Default is None.

Methods

pmf(x, m, n)	Probability mass function.
logpmf(x, m, n)	Log of the probability mass function.
rvs(m, n, size=1, random_state=None)	Draw random samples from a multivariate hypergeometric distribution.
mean(m, n)	Mean of the multivariate hypergeometric distribution.
var(m, n)	Variance of the multivariate hypergeometric distribution.
cov(m, n)	Compute the covariance matrix of the multivariate hypergeometric distribution.

See also

scipy.stats.hypergeom: The hypergeometric distribution.
scipy.stats.multinomial: The multinomial distribution.

Notes

m must be an array of positive integers. If the quantile \(i\) contains values out of the range \([0, m_i]\) where \(m_i\) is the number of objects of type \(i\) in the population or if the parameters are inconsistent with one another (e.g. x.sum() != n), methods return the appropriate value (e.g. 0 for pmf). If m or n contain negative values, the result will contain nan there.

The probability mass function for multivariate_hypergeom is

\[\begin{split}P(X_1 = x_1, X_2 = x_2, \ldots, X_k = x_k) = \frac{\binom{m_1}{x_1} \binom{m_2}{x_2} \cdots \binom{m_k}{x_k}}{\binom{M}{n}}, \\ \quad (x_1, x_2, \ldots, x_k) \in \mathbb{N}^k \text{ with } \sum_{i=1}^k x_i = n\end{split}\]

where \(m_i\) are the number of objects of type \(i\), \(M\) is the total number of objects in the population (sum of all the \(m_i\)), and \(n\) is the size of the sample to be taken from the population.

Added in version 1.6.0.

References

[1]

The Multivariate Hypergeometric Distribution, http://www.randomservices.org/random/urn/MultiHypergeometric.html

[2]

Thomas J. Sargent and John Stachurski, 2020, Multivariate Hypergeometric Distribution https://python.quantecon.org/multi_hyper.html

Examples

To evaluate the probability mass function of the multivariate hypergeometric distribution, with a dichotomous population of size \(10\) and \(20\), at a sample of size \(12\) with \(8\) objects of the first type and \(4\) objects of the second type, use:

>>> from scipy.stats import multivariate_hypergeom
>>> multivariate_hypergeom.pmf(x=[8, 4], m=[10, 20], n=12)
0.0025207176631464523

The multivariate_hypergeom distribution is identical to the corresponding hypergeom distribution (tiny numerical differences notwithstanding) when only two types (good and bad) of objects are present in the population as in the example above. Consider another example for a comparison with the hypergeometric distribution:

>>> from scipy.stats import hypergeom
>>> multivariate_hypergeom.pmf(x=[3, 1], m=[10, 5], n=4)
0.4395604395604395
>>> hypergeom.pmf(k=3, M=15, n=4, N=10)
0.43956043956044005

The functions pmf, logpmf, mean, var, cov, and rvs support broadcasting, under the convention that the vector parameters (x, m, and n) are interpreted as if each row along the last axis is a single object. For instance, we can combine the previous two calls to multivariate_hypergeom as

>>> multivariate_hypergeom.pmf(x=[[8, 4], [3, 1]], m=[[10, 20], [10, 5]],
...                            n=[12, 4])
array([0.00252072, 0.43956044])

This broadcasting also works for cov, where the output objects are square matrices of size m.shape[-1]. For example:

>>> multivariate_hypergeom.cov(m=[[7, 9], [10, 15]], n=[8, 12])
array([[[ 1.05, -1.05],
        [-1.05,  1.05]],
       [[ 1.56, -1.56],
        [-1.56,  1.56]]])

That is, result[0] is equal to multivariate_hypergeom.cov(m=[7, 9], n=8) and result[1] is equal to multivariate_hypergeom.cov(m=[10, 15], n=12).

Alternatively, the object may be called (as a function) to fix the m and n parameters, returning a “frozen” multivariate hypergeometric random variable.

>>> rv = multivariate_hypergeom(m=[10, 20], n=12)
>>> rv.pmf(x=[8, 4])
0.0025207176631464523