scipy.stats.sampling.DiscreteAliasUrn#

class scipy.stats.sampling.DiscreteAliasUrn(dist, *, domain=None, urn_factor=1, random_state=None)#

Discrete Alias-Urn Method.

This method is used to sample from univariate discrete distributions with a finite domain. It uses the probability vector of size \(N\) or a probability mass function with a finite support to generate random numbers from the distribution.

Parameters
distarray_like or object, optional

Probability vector (PV) of the distribution. If PV isn’t available, an instance of a class with a pmf method is expected. The signature of the PMF is expected to be: def pmf(self, k: int) -> float. i.e. it should accept a Python integer and return a Python float.

domainint, optional

Support of the PMF. If a probability vector (pv) is not available, a finite domain must be given. i.e. the PMF must have a finite support. Default is None. When None:

  • If a support method is provided by the distribution object dist, it is used to set the domain of the distribution.

  • Otherwise, the support is assumed to be (0, len(pv)). When this parameter is passed in combination with a probability vector, domain[0] is used to relocate the distribution from (0, len(pv)) to (domain[0], domain[0]+len(pv)) and domain[1] is ignored. See Notes and tutorial for a more detailed explanation.

urn_factorfloat, optional

Size of the urn table relative to the size of the probability vector. It must not be less than 1. Larger tables result in faster generation times but require a more expensive setup. Default is 1.

random_state{None, int, numpy.random.Generator,

A NumPy random number generator or seed for the underlying NumPy random number generator used to generate the stream of uniform random numbers. If random_state is None (or np.random), the numpy.random.RandomState singleton is used. If random_state is an int, a new RandomState instance is used, seeded with random_state. If random_state is already a Generator or RandomState instance then that instance is used.

Notes

This method works when either a finite probability vector is available or the PMF of the distribution is available. In case a PMF is only available, the finite support (domain) of the PMF must also be given. It is recommended to first obtain the probability vector by evaluating the PMF at each point in the support and then using it instead.

If a probability vector is given, it must be a 1-dimensional array of non-negative floats without any inf or nan values. Also, there must be at least one non-zero entry otherwise an exception is raised.

By default, the probability vector is indexed starting at 0. However, this can be changed by passing a domain parameter. When domain is given in combination with the PV, it has the effect of relocating the distribution from (0, len(pv)) to (domain[0], domain[0] + len(pv)). domain[1] is ignored in this case.

The parameter urn_factor can be increased for faster generation at the cost of increased setup time. This method uses a table for random variate generation. urn_factor controls the size of this table relative to the size of the probability vector (or width of the support, in case a PV is not available). As this table is computed during setup time, increasing this parameter linearly increases the time required to setup. It is recommended to keep this parameter under 2.

References

1

UNU.RAN reference manual, Section 5.8.2, “DAU - (Discrete) Alias-Urn method”, http://statmath.wu.ac.at/software/unuran/doc/unuran.html#DAU

2

A.J. Walker (1977). An efficient method for generating discrete random variables with general distributions, ACM Trans. Math. Software 3, pp. 253-256.

Examples

>>> from scipy.stats.sampling import DiscreteAliasUrn

To create a random number generator using a probability vector, use:

>>> pv = [0.1, 0.3, 0.6]
>>> urng = np.random.default_rng()
>>> rng = DiscreteAliasUrn(pv, random_state=urng)

The RNG has been setup. Now, we can now use the rvs method to generate samples from the distribution:

>>> rvs = rng.rvs(size=1000)

To verify that the random variates follow the given distribution, we can use the chi-squared test (as a measure of goodness-of-fit):

>>> from scipy.stats import chisquare
>>> _, freqs = np.unique(rvs, return_counts=True)
>>> freqs = freqs / np.sum(freqs)
>>> freqs
array([0.092, 0.292, 0.616])
>>> chisquare(freqs, pv).pvalue
0.9993602047563164

As the p-value is very high, we fail to reject the null hypothesis that the observed frequencies are the same as the expected frequencies. Hence, we can safely assume that the variates have been generated from the given distribution. Note that this just gives the correctness of the algorithm and not the quality of the samples.

If a PV is not available, an instance of a class with a PMF method and a finite domain can also be passed.

>>> urng = np.random.default_rng()
>>> class Binomial:
...     def __init__(self, n, p):
...         self.n = n
...         self.p = p
...     def pmf(self, x):
...         # note that the pmf doesn't need to be normalized.
...         return self.p**x * (1-self.p)**(self.n-x)
...     def support(self):
...         return (0, self.n)
... 
>>> n, p = 10, 0.2
>>> dist = Binomial(n, p)
>>> rng = DiscreteAliasUrn(dist, random_state=urng)

Now, we can sample from the distribution using the rvs method and also measure the goodness-of-fit of the samples:

>>> rvs = rng.rvs(1000)
>>> _, freqs = np.unique(rvs, return_counts=True)
>>> freqs = freqs / np.sum(freqs)
>>> obs_freqs = np.zeros(11)  # some frequencies may be zero.
>>> obs_freqs[:freqs.size] = freqs
>>> pv = [dist.pmf(i) for i in range(0, 11)]
>>> pv = np.asarray(pv) / np.sum(pv)
>>> chisquare(obs_freqs, pv).pvalue
0.9999999999999999

To check that the samples have been drawn from the correct distribution, we can visualize the histogram of the samples:

>>> import matplotlib.pyplot as plt
>>> rvs = rng.rvs(1000)
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> x = np.arange(0, n+1)
>>> fx = dist.pmf(x)
>>> fx = fx / fx.sum()
>>> ax.plot(x, fx, 'bo', label='true distribution')
>>> ax.vlines(x, 0, fx, lw=2)
>>> ax.hist(rvs, bins=np.r_[x, n+1]-0.5, density=True, alpha=0.5,
...         color='r', label='samples')
>>> ax.set_xlabel('x')
>>> ax.set_ylabel('PMF(x)')
>>> ax.set_title('Discrete Alias Urn Samples')
>>> plt.legend()
>>> plt.show()
../../_images/scipy-stats-sampling-DiscreteAliasUrn-1_00_00.png

To set the urn_factor, use:

>>> rng = DiscreteAliasUrn(pv, urn_factor=2, random_state=urng)

This uses a table twice the size of the probability vector to generate random variates from the distribution.

Methods

rvs([size, random_state])

Sample from the distribution.

set_random_state([random_state])

Set the underlying uniform random number generator.