Discrete Alias Urn (DAU)#

  • Required: probability vector (PV) or the PMF along with a finite domain

  • Speed:

    • Set-up: slow (linear with the vector-length)

    • Sampling: very fast

DAU samples from distributions with arbitrary but finite probability vectors (PV) of length N. The algorithm is based on an ingenious method by A.J. Walker and requires a table of size (at least) N. It needs one random number and only one comparison for each generated random variate. The setup time for constructing the tables is O(N).

>>> import numpy as np
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>>
>>> pv = [0.18, 0.02, 0.8]
>>> urng = np.random.default_rng()
>>> rng = DiscreteAliasUrn(pv, random_state=urng)
>>> rng.rvs()
0

By default, the probability vector is indexed starting at 0. However, this can be changed by passing a domain parameter. When domain is given in combination with the PV, it has the effect of relocating the distribution from (0, len(pv)) to (domain[0], domain[0] + len(pv)). domain[1] is ignored in this case.

>>> rng = DiscreteAliasUrn(pv, domain=(10, 13), random_state=urng)
>>> rng.rvs()
12

The method also works when no probability vector but a PMF is given. In that case, a bounded (finite) domain must also be given either by passing the domain parameter explicitly or by providing a support method in the distribution object:

>>> class Distribution:
...     def __init__(self, c):
...         self.c = c
...     def pmf(self, x):
...         return x**self.c
...     def support(self):
...         return (0, 10)
...
>>> dist = Distribution(2)
>>> rng = DiscreteAliasUrn(dist, random_state=urng)
>>> rng.rvs()
10
>>> import matplotlib.pyplot as plt
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>> class Distribution:
...     def __init__(self, c):
...         self.c = c
...     def pmf(self, x):
...         return x**self.c
...     def support(self):
...         return (0, 10)
...
>>> dist = Distribution(2)
>>> urng = np.random.default_rng()
>>> rng = DiscreteAliasUrn(dist, random_state=urng)
>>> rvs = rng.rvs(1000)
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> x = np.arange(1, 11)
>>> fx = dist.pmf(x)
>>> fx = fx / fx.sum()
>>> ax.plot(x, fx, 'bo', label='true distribution')
>>> ax.vlines(x, 0, fx, lw=2)
>>> ax.hist(rvs, bins=np.r_[x, 11]-0.5, density=True, alpha=0.5, color='r',
...         label='samples')
>>> ax.set_xlabel('x')
>>> ax.set_ylabel('PMF(x)')
>>> ax.set_title('Discrete Alias Urn Samples')
>>> plt.legend()
>>> plt.show()
" "

Note

As DiscreteAliasUrn expects PMF with signature def pmf(self, x: float) -> float, it first vectorizes the PMF using np.vectorize and then evaluates it over all the points in the domain. But if the PMF is already vectorized, it is much faster to just evaluate it at each point in the domain and pass the obtained PV instead along with the domain. For example, pmf methods of SciPy’s discrete distributions are vectorized and a PV can be obtained by doing:

>>> from scipy.stats import binom
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>> dist = binom(10, 0.2)  # distribution object
>>> domain = dist.support()  # the domain of your distribution
>>> x = np.arange(domain[0], domain[1] + 1)
>>> pv = dist.pmf(x)
>>> rng = DiscreteAliasUrn(pv, domain=domain)

Domain is required here to relocate the distribution.

The performance can be slightly influenced by setting the size of the used table which can be changed by passing a urn_factor parameter.

>>> # use a table twice the length of PV.
>>> urn_factor = 2
>>> rng = DiscreteAliasUrn(pv, urn_factor=urn_factor, random_state=urng)
>>> rng.rvs()
2

Note

It is recommended to keep this parameter under 2.

Please see [1] and [2] for more details on this method.

References#