scipy.stats.multivariate_hypergeom#
- scipy.stats.multivariate_hypergeom = <scipy.stats._multivariate.multivariate_hypergeom_gen object>[source]#
A multivariate hypergeometric random variable.
- Parameters:
- marray_like
The number of each type of object in the population. That is, \(m[i]\) is the number of objects of type \(i\).
- narray_like
The number of samples taken from the population.
- seed{None, int, np.random.RandomState, np.random.Generator}, optional
Used for drawing random variates. If seed is None, the RandomState singleton is used. If seed is an int, a new
RandomState
instance is used, seeded with seed. If seed is already aRandomState
orGenerator
instance, then that object is used. Default is None.
See also
scipy.stats.hypergeom
The hypergeometric distribution.
scipy.stats.multinomial
The multinomial distribution.
Notes
m must be an array of positive integers. If the quantile \(i\) contains values out of the range \([0, m_i]\) where \(m_i\) is the number of objects of type \(i\) in the population or if the parameters are inconsistent with one another (e.g.
x.sum() != n
), methods return the appropriate value (e.g.0
forpmf
). If m or n contain negative values, the result will containnan
there.The probability mass function for
multivariate_hypergeom
is\[\begin{split}P(X_1 = x_1, X_2 = x_2, \ldots, X_k = x_k) = \frac{\binom{m_1}{x_1} \binom{m_2}{x_2} \cdots \binom{m_k}{x_k}}{\binom{M}{n}}, \\ \quad (x_1, x_2, \ldots, x_k) \in \mathbb{N}^k \text{ with } \sum_{i=1}^k x_i = n\end{split}\]where \(m_i\) are the number of objects of type \(i\), \(M\) is the total number of objects in the population (sum of all the \(m_i\)), and \(n\) is the size of the sample to be taken from the population.
Added in version 1.6.0.
References
[1]The Multivariate Hypergeometric Distribution, http://www.randomservices.org/random/urn/MultiHypergeometric.html
[2]Thomas J. Sargent and John Stachurski, 2020, Multivariate Hypergeometric Distribution https://python.quantecon.org/multi_hyper.html
Examples
To evaluate the probability mass function of the multivariate hypergeometric distribution, with a dichotomous population of size \(10\) and \(20\), at a sample of size \(12\) with \(8\) objects of the first type and \(4\) objects of the second type, use:
>>> from scipy.stats import multivariate_hypergeom >>> multivariate_hypergeom.pmf(x=[8, 4], m=[10, 20], n=12) 0.0025207176631464523
The
multivariate_hypergeom
distribution is identical to the correspondinghypergeom
distribution (tiny numerical differences notwithstanding) when only two types (good and bad) of objects are present in the population as in the example above. Consider another example for a comparison with the hypergeometric distribution:>>> from scipy.stats import hypergeom >>> multivariate_hypergeom.pmf(x=[3, 1], m=[10, 5], n=4) 0.4395604395604395 >>> hypergeom.pmf(k=3, M=15, n=4, N=10) 0.43956043956044005
The functions
pmf
,logpmf
,mean
,var
,cov
, andrvs
support broadcasting, under the convention that the vector parameters (x
,m
, andn
) are interpreted as if each row along the last axis is a single object. For instance, we can combine the previous two calls tomultivariate_hypergeom
as>>> multivariate_hypergeom.pmf(x=[[8, 4], [3, 1]], m=[[10, 20], [10, 5]], ... n=[12, 4]) array([0.00252072, 0.43956044])
This broadcasting also works for
cov
, where the output objects are square matrices of sizem.shape[-1]
. For example:>>> multivariate_hypergeom.cov(m=[[7, 9], [10, 15]], n=[8, 12]) array([[[ 1.05, -1.05], [-1.05, 1.05]], [[ 1.56, -1.56], [-1.56, 1.56]]])
That is,
result[0]
is equal tomultivariate_hypergeom.cov(m=[7, 9], n=8)
andresult[1]
is equal tomultivariate_hypergeom.cov(m=[10, 15], n=12)
.Alternatively, the object may be called (as a function) to fix the m and n parameters, returning a “frozen” multivariate hypergeometric random variable.
>>> rv = multivariate_hypergeom(m=[10, 20], n=12) >>> rv.pmf(x=[8, 4]) 0.0025207176631464523
Methods
pmf(x, m, n)
Probability mass function.
logpmf(x, m, n)
Log of the probability mass function.
rvs(m, n, size=1, random_state=None)
Draw random samples from a multivariate hypergeometric distribution.
mean(m, n)
Mean of the multivariate hypergeometric distribution.
var(m, n)
Variance of the multivariate hypergeometric distribution.
cov(m, n)
Compute the covariance matrix of the multivariate hypergeometric distribution.